Version/Environment (if relevant):
4.6 thru current version as of writing (5.6).
Issue:
A customer noticed that some PVs still have a Retain policy for Domino Executions that appear to be stopped. Even after deleting workspaces, apps etc the PVs remain in the cluster rather than moving to a "delete" policy for quick removal. Is this expected?
Root Cause:
PV retention behavior can differ for jobs and workspaces per:
https://admin.dominodatalab.com/en/latest/admin_guide/cd38c2/manage-persistent-volumes/
A short summary from the above link regarding Jobs is " the PV unmounts and sits idle until it is either reused for the user’s next job or garbage collected ." And regarding Workspaces the summary is "A durable workspace PV will only be deleted if the user deletes the associated workspace."
So if your PV is for a job then that PV will be retained until garbage collection (see Resolution below) for the PV occurs. If your PV was for a workspace then it will be retained until the workspace is deleted.
Resolution:
For workspaces you can find PVC information and state of a workspace in the Admin->Workspace page. There you can also access and delete the workspace if it is unneeded to trigger the deletion of the PV.
For Jobs PVs are garbage collected based on two Central Config settings.
By default, Domino will:
-
Limit the total number of idle (not bound) PVs to 32. You can modify this via:
com.cerebro.domino.computegrid.kubernetes.volume.maxIdle
-
Terminate any idle PV that has not been used in
7
days. You can modify this via:com.cerebro.domino.computegrid.kubernetes.volume.maxAge
If you are desperate to clean up space you can ensure Jobs or Apps are no longer running (no Pod running matching their run-id/execution id), thus making the PV "Available" instead of "Bound", then manually remove any PVC and PV.
Notes/Information:
Per https://admin.dominodatalab.com/en/latest/admin_guide/c29bdf/garbage-collection/ , the more idle volumes you allow, the more likely it is that users (Jobs) can reuse a volume and avoid having to copy project files from the blob store. However, this comes at the cost of keeping additional idle PVs.
Background: Reclaim policy of a PV tells the cluster what to do with the volume after it is released. With the “Retain” policy, if we delete the PVC (PersistentVolumeClaim) the corresponding PersistentVolume is not deleted. With "Delete" policy the PV can be removed once the corresponding PVC is removed (since the PV is no longer bound).
delete/bound - workspace stopped or running.
retain/bound - running app or job.
delete/available - Likely in process of being removed by k8s cluster.
retain/available - Job or app garbage-collection candidate depending on how long it has been idle/unused.
Comments
0 comments
Please sign in to leave a comment.