Version
Domino v4x, 5x
Issue:
This article will provide details on how to recover files from a workspace PVC when given a First Name and Project name.
Root Cause
Code operations can cause the underlining OS to terminate in which case runs can end in an Error state, thereby results of runs are never synced back to the project for future use.
Resolution:
1. Find the User ID
Within the Admin UI of Domino, navigate to the MongoDB (Advanced > MongoDB) and enter
db.users.findOne({"firstName": "Abdul"})
This will provide you with the ObjectId (UserID), please take note of this.
{
"_id" : ObjectId("624ae9ff0bcb78657c12e8dd"),
"idpId" : "50889425-d217-4505-962b-6383f309adf6",
"loginId" : {
"id" : "domino-abdulk",
"lowercaseId" : "domino-abdulk"
2. Find the project that the user owns: <Check Deployment URL>
https://emeaplaydummy.support.domino.train/workspace/domino-abdulk/quick-start
We know from the URL it is quick-start, take note of this along with the ID we noted earlier
3. Put both the User ID and Project name into a MongoDB query to obtain the Project ID
db.projects.findOne({"name" : "quick-start","ownerId" : ObjectId("624ae9ff0bcb78657c12e8dd"), })
This will give us the Project ID and the ownerId (User ID)
{
"_id" : ObjectId("624ae9ff112cfb6b1ad5a459"),
"ownerId" : ObjectId("624ae9ff0bcb78657c12e8dd"),
"name" : "quick-start",
"created" : ISODate("2022-04-04T12:52:19.460Z"),
"collaborators" : [ ],
4. On your central node/bastion, construct a kubectl query that belongs to this User and this Project.
Note: We make use of the user projectId label and startingUserId label because sometimes projects are owned by organisation or other users.
This will give you a list of PVC(s) and their PV associated with runs:
root@ip-10-0-0-18:~# kubectl get pvc -n emeaplay255392-compute -l dominodatalab.com/projectId=624ae9ff112cfb6b1ad5a459,dominodatalab.com/startingUserId=624ae9ff0bcb78657c12e8dd
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
workspace-635fa7730aaf43315a70cb49 Bound pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad 10Gi RWO dominodisk 62m
5. Run the Salvage PVC script against the PV, (the script is under /domino/bin/salvage-pvc.sh or can be obtained from here):
root@ip-10-0-0-18:/domino/bin# ./salvage-pvc.sh pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad
Creating PVC namespace: emeaplay255392-compute name: workspace-635fa7730aaf43315a70cb49
Creating POD namespace: emeaplay255392-compute name salvage-pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad
pod/salvage-pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad created
Error from server (AlreadyExists): error when creating "STDIN": persistentvolumeclaims "workspace-635fa7730aaf43315a70cb49" already exists
Use:
kubectl exec -it -n emeaplay255392-compute salvage-pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad bash
Recovered data will be in /salvage
When you are done, please remove the pod and pvc:
kubectl delete -n emeaplay255392-compute pod salvage-pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad
kubectl delete -n emeaplay255392-compute pvc workspace-635fa7730aaf43315a70cb49
Additionally, please clean up the PV when data has been recovered:
kubectl delete pv pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad
If you wish to keep this PV to rebind later with this script, clear its claim ref:
kubectl patch pv pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad --type=json -p='[{"op": "remove", "path": "/spec/claimRef/uid"}]'
root@ip-10-0-0-18:/domino/bin#
6. Find the kubectl exec command (see the above snippet), copy and run it, this will exec you into the salvaged pvc pod with access to the files.
root@ip-10-0-0-18:/domino/bin# kubectl exec -it -n emeaplay255392-compute salvage-pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@salvage-pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad:/#
7. Change to directory /salvage/mnt and your run files will be visible
root@salvage-pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad:/# cd salvage/mnt/
root@salvage-pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad:/salvage/mnt# ls -ltrh
total 21M
-rw-r--r-- 1 12574 12574 1015 Jan 1 1970 requirements_apps.txt
-rw-r--r-- 1 12574 12574 488 Jan 1 1970 model.py
-rw-r--r-- 1 12574 12574 381 Jan 1 1970 model.R
-rw-r--r-- 1 12574 12574 879 Jan 1 1970 main.py
.....
8. Compress the files using tar
root@salvage-pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad:/salvage/mnt# tar cvfz /tmp/recovery.tgz .
./
./benefits.png
./iris_sample_notebook.ipynb
./retriever-12-error.txt
./kubernetes-events.json
....
9. Once compressed, in another session - Transfer your files out of the salvage pod
root@ip-10-0-0-18:/tmp# kubectl cp -n emeaplay255392-compute salvage-pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad:/tmp/recovery.tgz /tmp/recovery-central.tgz
tar: Removing leading `/' from member names
root@ip-10-0-0-18:/tmp# ls -ltrh /tmp
total 4.5M
drwx------ 3 root root 4.0K Oct 31 08:00 systemd-private-c6b6d103dae8493e9916b90b6d94d284-systemd-timesyncd.service-OcQ06W
-rw-r--r-- 1 root root 4.5M Oct 31 13:51 recovery-central.tgz
root@ip-10-0-0-18:/tmp# tar -tvf recovery-central.tgz
drwxr-xr-x 12574/12574 0 2022-10-31 13:43 ./
-rw-r--r-- 12574/12574 69280 1970-01-01 00:00 ./benefits.png
-rw-r--r-- 12574/12574 122358 1970-01-01 00:00 ./iris_sample_notebook.ipynb
-rw-rw-r-- 12574/12574 2826 2022-10-31 13:43 ./retriever-12-error.txt
-rw-rw-r-- 12574/12574 173092 2022-10-31 13:43 ./kubernetes-events.json
-rw-r--r-- 12574/12574 4054 1970-01-01 00:00 ./app-dash.py
10. Cleanup Options
When you are done, to remove the pod and pvc:
kubectl delete -n emeaplay255392-compute pod salvage-pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad
kubectl delete -n emeaplay255392-compute pvc workspace-635fa7730aaf43315a70cb49
Additionally, please clean up the PV when data has been recovered:
kubectl delete pv pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad
If you wish to keep this PV to rebind later with this script, clear its claim ref:
kubectl patch pv pvc-f95cf879-abd2-47a6-83cb-0a96c77e28ad --type=json -p='[{"op": "remove", "path": "/spec/claimRef/uid"}]'
Note(s)
The above article describes recovery of a workspace in existence providing its PVC was not deleted. If a workspace is deleted, the normal underlining operation is that the PVC deletion follows.
Comments
0 comments
Please sign in to leave a comment.