DevOps | Cloud | Analytics | Open Source | Programming





How To Fix - "DockerTimeoutError" Error in AWS Jobs ?



In this post, we will explore How To Fix - "DockerTimeoutError" Error in AWS Jobs. Error Logs -


"DockerTimeoutError: Could not transition to started; timed out after waiting xm0s".


dockertimeouterror unable transition start timeout after wait 3m0s


CannotInspectContainerError: Could not transition to inspecting; timed out after waiting xs

 

The default timeout for AWS ECS container agent is four minutes. If it takes more than four minutes, then AWS Batch returns a DockerTimeoutError error.   Firts thing first, check the basic details of docker.


$ docker info


$ docker --debug info


$ docker system info

 

There might be various causes for this error or issue. We are listing few checks and corresponding guidelines to practice. Try those and see if that helps.  

Checks and Guidelines:

 

  • Are all previous stopped containers been deleted to free up space ?
 

  • Have you included the ECS cleanup process in the AMI ?  You can use below environment variables to do the image clean up
  ECS_IMAGE_CLEANUP_INTERVAL - Specifies how frequently the automated image cleanup process should check for images to delete. The default is every 30 minutes but you can reduce to 10 minutes to remove images more frequently. ECS_IMAGE_MINIMUM_CLEANUP_AGE - Specifies the minimum amount of time between when an image was pulled and when it may become a candidate for removal. The default is 1 hour. Used to prevent cleaning up images that have just been pulled. ECS_NUM_IMAGES_DELETE_PER_CYCLEThis variable specifies how many images may be removed during a single cleanup cycle. The default is 5 and the minimum is 1.  

  • What launch type are you using - EC2  or Fargate ? Fargate launch type might cause issue to run Windows containers. So you might have to use EC2 Windows.
  • Are you using VPC endpoints ? You have to use VPC endpoints if the tasks are running in a private subnet (with No NAT gateway\NAT instance). To download image from ECR, Container Instance would require access to ECR/S3 endpoints.So if your subnet is private, either use a Private Link option or use NAT gateway for reaching ECR endpoints.
 

  • Do you have VPC endpoint for your Fargate tasks ?
 

  • What is your Log Driver ? In your AWS ECS console, for your Task definition & Container Definition within that, for Log Configuration, set Log driver to awslogs.
 

  • Check your Services' Security Group. Does it allow the port required ? You might have to add ingress rule to allow port. Otherwise this might also cause timeout issue.
 

  • Have you run out of run out of EBS burst credits ? Check Burst Balance for your EC2 Instance.
 

  • What is the volume IO utilization ? If it very High for those the EC2 instances, it would cause the Docker operations to timeout. In such a case, use larger or different volume type like Solid state drives (SSD)
  Hope this helps to fix the AWS issue.    

Other Interesting Reads :

   


DockerTimeoutError ,DockerTimeoutError: Could not transition to started; timed out after waiting xm0s ,dockertimeouterror unable transition start timeout after wait 3m0s ,CannotInspectContainerError: Could not transition to inspecting; timed out after waiting xs ,dockertimeouterror: could not transition to created; timed out after waiting ,cannotinspectcontainererror: could not transition to inspecting; timed out after waiting 30s ,docker build timeout ,docker hub ,docker container timeout ,aws ecr timeout ,docker run timeout ,ecs\_container\_start\_timeout ,dockertimeouterror ,dockertimeouterror ecs , ,cannotinspectcontainererror: could not transition to inspecting; timed out after waiting 30s ,ecs\_container\_start\_timeout ,aws ecr timeout ,ecs config ,docker container timeout ,dockertimeouterror: could not transition to created; timed out after waiting ,aws ecs environment variables ,docker build timeout