Problem related to a local NFS Server on a self managed deployment
Pods mysteriously go down (NFS backed, could include keycloak, postgresql)
Events are seen on pods:
kubelet, $node Unable to attach or mount volumes: unmounted volumes=[execution-vol], unattached volumes=[run-$RUNID-nginx-config-vol run-$RUNID-start-run-script-vol executor-storage-vol execution-vol executor-blob-store execution-secrets-vol custom-certs-vol filecache-vol]: timed out waiting for the condition
If you've installed the Blob store or dominoshared store on a local NFS server, you may have run out of resources for simultaneous connections to the NFS service.
This is a common task on RedHat NFS services, which defaults to 8 threads.
NFS Clients will demonstrate messages similar to this in the messages log or dmesg output:
nfs: server $SERVERNAME not responding, still trying
nfs: server $SERVERNAME OK
On the NFS server, you can investigate how many simultaneous users you are allowing and the max number seen by checking the file: /proc/net/rpc/nfsd (the counter however is deprecated in RHEL 6 and the 2nd number will not reflect the maximum clients hit).
Review the following line:
th 16 0 2.610 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
The first number reflects your setting for maximum simultaneous NFS clients. This needs to cover your expectation (nodes X number of connections from each node).
This can impact services which host their data on the dominoshared, in some instances the Postgresql database - which backs keycloak!
Increase the number of threads, using: https://access.redhat.com/solutions/28211.