I am unable to start a Workspace. The Nodes or Auto-Scaler may have some messages. There are a number of Nodes in 'NotReady' state but I have a large number of Nodes too!
There is a possible state where the autoscaler/networking side cannot allocate an IP Address to your Nodes due to the nodepool running out of IP Addresses. Killing off Nodes is unlikely to help, as any 'new' Node will run into the same problem.
kubectl describe on the failing pods may include:
Node: <none>
and
Warning FailedScheduling <unknown> default-scheduler 0/142 nodes are available: 128 Insufficient memory, 135 Insufficient cpu, 7 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
You must work with the Cloud provider or the operators to ensure the network CIDR address range is large enough to accommodate the entire number of Nodes and service IP Addresses. This varies according to each provider.
If you receive a message about podCidr, then you also may have allocated too large networks to the Nodes, which consumes the IP Address range as well.
networkPlugin cni failed to set up pod "run-xxx-yyy-prod-compute" network: no podCidr for node ip-a.b.c.d.ec2.internal
Increase the cluster .yaml file's-- node-cidr-mask-size
This actually decreases the amount of IP Addresses available for Nodes to allocate for services, but provides better utilisation of the larger subnet.
Further logs would be in the canal, calico-node and kube-flannel pods if they exist. Some cloud providers have different networking providers which may have different pods managing connectivity!
Comments
0 comments
Please sign in to leave a comment.