Version:
Domino 5.x
What are dataset snapshots:
Dataset snapshots in Domino Data Lab are a point-in-time replica of a data set, including all of its data and metadata, that records the status of the data set at a certain point in time. Snapshots may be used to create repeatable analyses and experiments, as well as to ensure that the data utilised in an analysis does not change over time.
With Domino, you can also utilise data set snapshots to build branches of your data set for various purposes like as testing or validation. You may construct a different branch of a data set that can be utilised for a specific purpose without impacting the original data set by producing a snapshot of the data set and then making modifications to it.
Overall, data set snapshots are an effective technique for guaranteeing data analysis workflow repeatability and consistency.
Listing dataset snapshots for a project (With Admin UI Access)
To list the dataset snapshots you will need to have appropriate admin access, from the Admin UI, select Advanced > MongoDB
In the MongoDB terminal, substitute in the Project name in the command below
rs.secondaryOk()
db.projects.find({"name" : "<Project Name>",})
This will give you output similar to:
{
"_id" : ObjectId("6410b67b7ce95f628cb5d62c"),
"ownerId" : ObjectId("631f593e8802f91dd1632a94"),
"name" : "Project Name",
Once you've captured the ObjectId above, substitute it into the command below:
rs.secondaryOk()
db.datasetrw.find({"metadata.labels.project-object-id": "6410b67b7ce95f628cb5d62c"})
Datasets for the project will be displayed below along with their snapshots (Note: you may need to scroll down the MongoDB Output to find the correct dataset, if working with multiple datasets):
{ "_id" : ObjectId("641999e84564ec6408e7eafc"), "name" : "test-data-set", "description" : "Abduls", "author" : ObjectId("6410b6787ce95f628cb5d62a"), "snapshots" : [ ObjectId("641999e84564ec6408e7eafb"), ObjectId("64199a0e4564ec6408e7eb15"), ObjectId("64199f654564ec6408e7ebe2"), ObjectId("6419e9224564ec6408e7ede2"), ObjectId("6419ecb04564ec6408e7ee44") ], "metadata" : { "labels" : { "project-object-id" : "6410b67b7ce95f628cb5d62c" }, "creationDateMillis" : NumberLong("1679399400160") }, "tags" : { "64199f654564ec6408e7ebe2" : [ "snap2" ], "6419e9224564ec6408e7ede2" : [ "sanp3" ], "64199a0e4564ec6408e7eb15" : [ "Snap1" ], "6419ecb04564ec6408e7ee44" : [ "snap4" ] },
Checking the status of a dataset snapshot
To check the status of a snapshot, we refer to its LifecycleStatus, you can use the following command in MongoDB and substitute in the collection objectId from the above output to view the status of each snapshot in a dataset:
db.datasetrw_snapshot.find({"collectionId": ObjectId("641999e84564ec6408e7eafc")}, {"status": 1})
The output would be similar to:
{ "_id" : ObjectId("641999e84564ec6408e7eafb"), "status" : { "lifecycleStatus" : "Active", "statusLastUpdatedDateMillis" : NumberLong("1679420609546"), "statusLastUpdatedBy" : ObjectId("000000000000000000000000") } } { "_id" : ObjectId("64199a0e4564ec6408e7eb15"), "status" : { "lifecycleStatus" : "Active", "statusLastUpdatedDateMillis" : NumberLong("1679399452395"), "statusLastUpdatedBy" : ObjectId("000000000000000000000000") } } { "_id" : ObjectId("64199f654564ec6408e7ebe2"), "status" : { "lifecycleStatus" : "Deleted", "statusLastUpdatedDateMillis" : NumberLong("1681210000025"), "statusLastUpdatedBy" : ObjectId("6410b6787ce95f628cb5d62a") } } { "_id" : ObjectId("6419e9224564ec6408e7ede2"), "status" : { "lifecycleStatus" : "Active", "statusLastUpdatedDateMillis" : NumberLong("1679419696810"), "statusLastUpdatedBy" : ObjectId("000000000000000000000000") } } { "_id" : ObjectId("6419ecb04564ec6408e7ee44"), "status" : { "lifecycleStatus" : "Active", "statusLastUpdatedDateMillis" : NumberLong("1679420609548"), "statusLastUpdatedBy" : ObjectId("000000000000000000000000") } }
Common Issues:
If you get the following output when looking up the datasets for a project
Error: error: {
"ok" : 0,
"errmsg": "node is not in primary or recovering state",
"code": 13436,
"codeName" : "NotMasterOrSecondary"
}
Its quite possible that you are looking at a project that has already been archived and that another project with the same name exists that is not and this is the one you require, look out for ' "isArchived" : true' in the mongo output when searching for projects.
Notes:
Dataset Admin Docs: https://docs.dominodatalab.com/en/5.0/user_guide/0a8d11/datasets/
Comments
0 comments
Please sign in to leave a comment.