Version/Environment:
Domino 5.4.x or later
Issue:
Restarting (through save & restart or stopping and starting a workspace) workspace results in the workspace attempting to start but is stuck on 'Preparing' until timeout. Reviewing the events.json from the workspace support bundle or the run that the workspace created has this message:
Warning FailedAttachVolume 3m32s attachdetach-controller Multi-Attach error for volume "pvc-953a1ffa-xxxx-xxxx-xxxx-9129a9567689" Volume is already exclusively attached to one node and can't be attached to another
The warning reports that the workspace PV is exclusively attached to one node and cannot be attached to another.
Root Cause:
The workspace PV is currently attached to a node it's no longer using or does not exist. Due to race conditions between detaching and deleting volume operations, some workspace volumes never get detached from the nodes. This can be due to a node being recycled or after a Domino upgrade.
Resolution:
Deleting the volumeattachment allows the workspace pod to be attached to a new node when the workspace is re-launched.
Note: Please make sure to shut down the workspace before performing these steps, as the workspace may still fail to assign itself to a node after deleting the volumeattachment.
The following steps will walk through how to delete it through kubectl.
1. Find the volumeattachment that is associated with workspace PV. This can be found in events.json or through a kubectl describe pod of the run:
# kubectl get volumeattachment | grep pvc-953a1ffa-xxxx-xxxx-xxxx-9129a9567689
csi-d0329b8e1b6510626ce31c4be79632a43c0cb5b6ad16da9c0377a42ff715b8fb ebs.csi.aws.com pvc-953a1ffa-xxxx-xxxx-xxxx-9129a9567689 ip-xx-xx4-xx-xx.ec2.internal true 22h
2. Confirm the node that the volumeattachment is associated with does not exist or that no workloads are using the PV:
# kubectl get no | grep ip-xx-xx4-xx-xx.ec2.internal
ip-xx-xx4-xx-xx.ec2.internal Ready worker 28h v1.24.9
# kubectl describe no ip-xx-xx4-xx-xx.ec2.internal
....Non-terminated Pods: (10 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
cattle-system cattle-node-agent-xxxx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28h
domino-platform aws-ebs-csi-driver-node-xxxx 20m (0%) 200m (2%) 40Mi (0%) 200Mi (0%) 3m47s
domino-platform aws-efs-csi-driver-node-xxxx 20m (0%) 200m (2%) 40Mi (0%) 200Mi (0%) 28h
domino-platform docker-registry-cert-mgr-xxxx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28h
domino-platform fluentd-xxxx 200m (2%) 1 (12%) 600Mi (1%) 2Gi (6%) 28h
domino-platform newrelic-infrastructure-xxxxx 100m (1%) 0 (0%) 30Mi (0%) 512Mi (1%) 28h
domino-platform newrelic-logging-xxxxx 250m (3%) 500m (6%) 64Mi (0%) 512Mi (1%) 28h
domino-platform nvidia-device-plugin-xxxxx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28h
domino-platform smarter-device-manager-xxxxx 10m (0%) 100m (1%) 15Mi (0%) 15Mi (0%) 28h
3. Delete the volumeattachment found in the first step:
# kubectl delete volumeattachment csi-d0329b8e1b6510626ce31c4be79632a43c0cb5b6ad16da9c0377a42ff715b8fb
4. Describe the PV that was associated with volumeattachment to find the PVC associated with it:
% kubectl describe pvpvc-953a1ffa-xxxx-xxxx-xxxx-9129a9567689
-n domino-compute
Name:pvc-953a1ffa-xxxx-xxxx-xxxx-9129a9567689
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: ebs.csi.aws.com
volume.kubernetes.io/provisioner-deletion-secret-name:
volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers: [kubernetes.io/pv-protection external-attacher/ebs-csi-aws-com]
StorageClass: dominodisk
Status: Bound
Claim: domino-compute/workspace-64ff48e260af5535065af769 <---------
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 10Gi
Node Affinity:
Required Terms:
Term 0: topology.ebs.csi.aws.com/zone in [xxxxxxx}]
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: ebs.csi.aws.com
FSType: ext4
VolumeHandle: vol-xxxxxxxxxxx
ReadOnly: false
VolumeAttributes: storage.kubernetes.io/csiProvisionerIdentity=1xxxxxxxxxxxxx-ebs.csi.aws.com
Events: <none>
5. Describe the PVC to confirm that it is not mounted to anything:
% kubectl describe pvc workspace-64ff48e260af5535065af769 -n domino-compute
Name: workspace-64ff48e260af5535065af769
Namespace: domino-compute
StorageClass: dominodisk
Status: Bound
Volume: pvc-953a1ffa-xxxx-xxxx-xxxx-9129a9567689
Labels: dominodatalab.com/projectId=64xxx160af553xxxafxxxx
dominodatalab.com/startingUserId=64xxxf78aaxxxx0f4355e15
dominodatalab.com/type=Workspace
dominodatalab.com/workspaceId=64ff48e260af5535065af769
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
pv.kubernetes.io/migrated-to: ebs.csi.aws.com
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
volume.kubernetes.io/selected-node: ip-xx-xx-xxx-xxx.ec2.internal
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 25Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: <none> <---------
Events: <none>
6. Start the workspace again, and it should start and get past the 'Preparing' stage.
Notes/Information:
If you are uncomfortable with performing these operations, please contact Domino Support for further assistance.
References/Internal Records:
- ZD-81365
Comments
0 comments
Please sign in to leave a comment.