Version/Environment (if relevant):
This applies to deployments hosted in Amazon AWS.
Issue:
A customer's pod was failing to transition to a running state, it's describe
showed this event:
Warning FailedCreatePodSandBox 7s (x15 over 3m11s) kubelet Failed to create pod sandbox: rpc error: code= Unknown desc = failed to get sandbox image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.5": failed to pull image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.5": failed to pull and unpack image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.5": failed to resolve reference "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.5": pulling from host 602401143452.dkr.ecr.us-east-1.amazonaws.com failed with status code [manifests 3.5]: 401 Unauthorized
The issue impacted their DEV environment, but not their PROD environment and in some ways didn't seem 100% reproducible. The customer found that this problem did NOT occur when a new NODE was spun-up to host the pod.
Root Cause:
Container image pause
is regularly used by kubernetes as a temporary placeholder for future containers during pod creation. Due to a problem in EKS there is a broken process in which a garbage collector is pruning images on nodes and removing the pause container image that is initially added as part of the node provisioning. Apparently it is no longer possible to fetch that "pause" from the default referenced location also due to a change on the EKS side. The EKS bug is partly discussed here, https://github.com/aws/amazon-vpc-cni-k8s/issues/2030#issuecomment-1372204463
The support for Container Runtime Interface (CRI) for Docker (Dockershim) has been discontinued with the introduction of Kubernetes 1.24 in EKS. Official Amazon EKS Amazon Machine Images (AMIs) now use containerd as the runtime exclusively.
The issue occurs when kubelet, a key component of Kubernetes, seeks to garbage collect these "pause" containers. While Kubernetes offers credentials for obtaining pod images, containerd does not use these credentials when requesting the sandbox image. As a result, users who attempt to pull these images may experience authentication issues due to a lack of saved credentials in containerd. This drawback necessitates the re-deployment of nodes.
To address this, kubelet offers the '--pod-infra-container-image' flag which is specific to the EKS "pause" image. This flag has a twofold purpose: it guarantees that the image is not pruned by the image garbage collector and so helps prevent the occurrence of such issues. Ordinarily, kubelet should not undertake the garbage collection of the "pause" container. If this occurs, it may potentially result from various underlying causes, such as the "pause" being garbage collected due to disk pressure (evidenced by commands like 'df -h /var/lib..').
For further insights and discussions related to these challenges and solutions, you may refer to the following resources:
- Kubernetes GitHub issue discussing pause container image: https://github.com/kubernetes/kubernetes/issues/81756
- Rancher GitHub issue regarding Kubernetes and pause container: https://github.com/rancher/rke2/issues/1830
- Kubernetes GitHub issue about kubelet's behavior and pause container: https://github.com/kubernetes/kubernetes/issues/62388
Resolution:
If you are witnessing this problem you can attempt to
- Ensure the nodes have policy AmazonEC2ContainerRegistryReadOnly, if not add it. The following can check.
aws iam list-entities-for-policy --policy-arn "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly" --entity-filter Role --query 'PolicyRoles[].RoleName'
- You can also try the bootstrap args listed in the link above.
- Check the root volume is sufficiently sized (# df -h | column -t) , also check that AMI versions used by the nodes is recent and compatible with EKS 1.24
-
Check the config.toml on these nodes looks good, especially for sandbox_image.
root@ip-10-0-86-99 ~]# cd /etc/containerd/
[root@ip-10-0-86-99 containerd]# ls
config.toml
[root@ip-10-0-86-99 containerd]# cat config.toml
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
[grpc]
address = "/run/containerd/containerd.sock"
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
discard_unpacked_layers = true
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.5"
- Check kubelet entry for --pod-infra-container-image
root 17634 2.3 0.3 2817932 119732 ? Ssl 23:11 0:07
/usr/bin/kubelet --config /etc/kubernetes/kubelet/kubelet-config.json
--kubeconfig /var/lib/kubelet/kubeconfig --container-runtime-endpoint
unix:///run/containerd/containerd.sock --image-credential-provider-config
/etc/eks/image-credential-provider/config.json --image-credential-provider-bin-dir
/etc/eks/image-credential-provider --node-ip=10.0.86.99 --pod-infra-container-image=602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.5 --v=2 --cloud-provider=aws --container-runtime=remote --node-labels lifecycle=OnDemand --node-labels=dominodatalab.com/node-pool=default,dominodatalab.com/domino-node=true
- If urgent you can cordon the broken node and cycle it out with a newer node.
- If the above does not work you can also attempt to cordon the node, restart kubelet on the effected node and uncordon the node:
On the kubernetes environment, cordon the node
#kubectl cordon <node name>
Logon and run on the node itself
#sudo systemctl restart kubelet && sudo systemctl status kubelet
On the kubernetes environment, uncordon the node
#kubectl uncordon <node name>
- If the above does not help, you should contact AWS Support.
Caveat:
In Domino 5.4.x. An issue has been identified under DOM-44673, where the Image-cache-agent is pruning the pause image from EKS nodes in ECR. This has been fixed under Domino 5.5. If you identify this to be the issue, then there is the option of disabling the Image-cache-agent, doing this however will have a negative impact on performance since the agent will no longer be able to prune cached images on compute nodes.
To disable:
This command patches the image-cache-agent daemonset in the domino-platform namespace by adding a nodeSelector that will not match any nodes.
kubectl -n domino-platform patch daemonset image-cache-agent -p '{"spec": {"template": {"spec": {"nodeSelector": {"non-existing": "true"}}}}}'
To re-enable:
This command patches the image-cache-agent daemonset in the domino-platform namespace to remove the nodeSelector that was added in the previous example.
kubectl -n domino-platform patch daemonset image-cache-agent --type json -p='[{"op": "remove", "path": "/spec/template/spec/nodeSelector/non-existing"}]'
Comments
0 comments
Please sign in to leave a comment.