Domino 5.4.x or later
Restarting (through save & restart or stopping and starting a workspace) workspace results in the workspace attempting to start but is stuck on 'Preparing' until timeout. Reviewing the events.json from the workspace support bundle or the run that the workspace created has this message:
Warning FailedAttachVolume 3m32s attachdetach-controller Multi-Attach error for volume "pvc-953a1ffa-xxxx-xxxx-xxxx-9129a9567689" Volume is already exclusively attached to one node and can't be attached to another
The warning reports that the workspace PV is exclusively attached to one node and cannot be attached to another.
The workspace PV is currently attached to a node it's no longer using or does not exist. Due to race conditions between detaching and deleting volume operations, some workspace volumes never get detached from the nodes. This can be due to a node being recycled or after a Domino upgrade.
Deleting the volumeattachment allows the workspace pod to be attached to a new node when the workspace is re-launched.
Note: Please make sure to shut down the workspace before performing these steps, as the workspace may still fail to assign itself to a node after deleting the volumeattachment.
The following steps will walk through how to delete it through kubectl.
1. Find the volumeattachment that is associated with workspace PV. This can be found in events.json or through a kubectl describe pod of the run:
# kubectl get volumeattachment | grep pvc-953a1ffa-xxxx-xxxx-xxxx-9129a9567689
csi-d0329b8e1b6510626ce31c4be79632a43c0cb5b6ad16da9c0377a42ff715b8fb ebs.csi.aws.com pvc-953a1ffa-xxxx-xxxx-xxxx-9129a9567689 ip-xx-xx4-xx-xx.ec2.internal true 22h
2. Confirm the node that the volumeattachment is associated with does not exist or that no workloads are using the PV:
# kubectl get no | grep ip-xx-xx4-xx-xx.ec2.internal
ip-xx-xx4-xx-xx.ec2.internal Ready worker 28h v1.24.9
# kubectl describe no ip-xx-xx4-xx-xx.ec2.internal
Non-terminated Pods: (10 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
cattle-system cattle-node-agent-xxxx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28h
domino-platform aws-ebs-csi-driver-node-xxxx 20m (0%) 200m (2%) 40Mi (0%) 200Mi (0%) 3m47s
domino-platform aws-efs-csi-driver-node-xxxx 20m (0%) 200m (2%) 40Mi (0%) 200Mi (0%) 28h
domino-platform docker-registry-cert-mgr-xxxx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28h
domino-platform fluentd-xxxx 200m (2%) 1 (12%) 600Mi (1%) 2Gi (6%) 28h
domino-platform newrelic-infrastructure-xxxxx 100m (1%) 0 (0%) 30Mi (0%) 512Mi (1%) 28h
domino-platform newrelic-logging-xxxxx 250m (3%) 500m (6%) 64Mi (0%) 512Mi (1%) 28h
domino-platform nvidia-device-plugin-xxxxx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28h
domino-platform smarter-device-manager-xxxxx 10m (0%) 100m (1%) 15Mi (0%) 15Mi (0%) 28h
3. Delete the volumeattachment found in the first step:
# kubectl delete volumeattachment csi-d0329b8e1b6510626ce31c4be79632a43c0cb5b6ad16da9c0377a42ff715b8fb
4. Describe the PV that was associated with volumeattachment to find the PVC associated with it:
% kubectl describe pv
Annotations: pv.kubernetes.io/provisioned-by: ebs.csi.aws.com
Finalizers: [kubernetes.io/pv-protection external-attacher/ebs-csi-aws-com]
Claim: domino-compute/workspace-64ff48e260af5535065af769 <---------
Reclaim Policy: Delete
Access Modes: RWO
Term 0: topology.ebs.csi.aws.com/zone in [xxxxxxx}]
Type: CSI (a Container Storage Interface (CSI) volume source)
5. Describe the PVC to confirm that it is not mounted to anything:
% kubectl describe pvc workspace-64ff48e260af5535065af769 -n domino-compute
Annotations: pv.kubernetes.io/bind-completed: yes
Access Modes: RWO
Mounted By: <none> <---------
6. Start the workspace again, and it should start and get past the 'Preparing' stage.
If you are uncomfortable with performing these operations, please contact Domino Support for further assistance.