Version/Environment (if relevant):
All versions on AWS/EKS
Issue:
Customers find Pods stuck in ContainerCreating and/or liveness probes fail with an event in the kubectl describe which mentions errno 524
:Warning FailedCreatePodSandBox 4m9s (x384 over 89m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524: unknown
Root Cause:
There is a linux kernel memory leak bug in the EKS optimized AMI based on linux kernel version 5.10.x. The linux bug can be verified by usingcat /proc/sys/net/core/bpf_jit_limit
and seeing what the limit currently is and by using cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}'
, to see if the limit has been exceeded.
Example:
[root@ip-10-0-35-23 /]# cat /proc/sys/net/core/bpf_jit_limit
264241152
[root@ip-10-0-35-23 /]# cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}'
352337920
More details: https://github.com/awslabs/amazon-eks-ami/issues/1219#issuecomment-1533797378
"This issue is more likely to be encountered with kernel versions kernel-5.10.176-157.645.amzn2
thru kernel-5.10.177-158.645.amzn2
where the rate of the memory leak is higher."
Resolution:
AWS corrected the problem by patching by a new AMI Release AMI Release v20230501 · awslabs/amazon-eks-ami , released May 3rd 2023.
Temporary workaround is to run sysctl net.core.bpf_jit_limit=452534528
on all platform nodes by ssh'ing directly onto them.
The permanent fix is to get your EKS to use this AMI: Release AMI Release v20230501 · awslabs/amazon-eks-ami
Notes/Information:
Note the "temporary workaround" is short-term only, since the kernel bug can continue leaking.
Comments
0 comments
Please sign in to leave a comment.