Version/Environment (if relevant):
Less than Domino 5.2. At Domino 5.2 there was a rewrite of many of the public REST API endpoints including the 'create a snapshot' endpoint.
Issue:
This is a 'how to' KB. If you want to programmatically create a new snapshot for a dataset this should walk you through the process.
Resolution:
To summarize what we are trying to work through here, we are taking an existing Domino dataset within an existing Domino project and creating a new snapshot on the Dataset from files within a Domino execution.
The example will use python3, but will not be a fully fleshed out piece of code that you can cut and paste, but more just the process with examples of the actual API calls and how to form them.
Lets start with the API call for the snapshot creation...
POST https://mydomino.mycompany.com/v4/datasetrw/snapshot
ref: https://dominodatalab.github.io/api-docs/#/reference/datasets/create-snapshot/create-snapshot
NOTE: Throughout the article we will use the URL mydomino.mycompany.com. Substitute the URL for your Domino deployment wherever you see this.
The request payload here has two entries...
relativeFilePaths
ex. /newfiles
&
datasetId
ex. 62df14824423920fe1871cdd
The obvious questions here are how do I get these values and what exactly is the syntax to execute the API call. You will need a project ID to acquire the dataset ID. There are a few ways to acquire the project ID depending on what you know and what is easiest to parse for you. So we'll walk through an example from the first steps.
Example Case:
How to get the relativeFilePaths value
This path is simply the directory where the files you want to add to snapshot are stored. So the name of it is of course up to you. I'm going to call mine Snapshot1. Ok so what's it relative to? The docs note the following....
Array of filepaths relative to the dataset directory
So how do you find the dataset directory? Let's go to the Data section of the Domino project UI for a moment to find the path...
So for the sample, the path everything will be relative to is /domino/datasets/local/quick-start. From a workspace under this directory, I'll create my Snapshot1 directory and create the files I want, file1, file2, & file3 to add to my snapshot.
So the value for relativeFilePaths will be Snapshot1.
How to get the datasetId value
Let's start with just your username, for example, User1.
1. From the username we can acquire the user's user ID...
import os
import requests
headers = {'X-Domino-Api-Key': os.environ['DOMINO_USER_API_KEY']}
url = 'https://mydomino.mycompany.com/v4/users?userName=User1'
response_body = requests.get(url, headers=headers).content
print (response_body)
The response to this should look like...
b'[{"firstName":"User","lastName":"One","fullName":"User One","userName":"user1","email":"user1@notanemail.com","avatarUrl":"","id":"62e2ee7a4423920fe1871d79"}]'
The important part of the output is "id":"62e2ee7a4423920fe1871d79". This is the 'userId' you need for the next step.
2. Now we have a 'userId', 62e2ee7a4423920fe1871d79, we can acquire a list of projects and their IDs.
import os
import requests
headers = {'X-Domino-Api-Key': os.environ['DOMINO_USER_API_KEY']}
url = 'https://mydomino.mycompany.com/v4/projects?ownerId=62e2ee7a4423920fe1871d79'
response_body = requests.get(url, headers=headers).content
print (response_body)
The response to this should look like...
b'[{"id":"62e2ee7b4423920fe1871d7b","name":"quick-start","description":"This is a sample Domino Project. This project contains examples for using notebooks, publishing models as APIs, and publishing Python/Flask and R/Shiny web applications.","visibility":"Private","ownerId":"62e2ee7a4423920fe1871d79","ownerUsername":"user1","collaboratorIds":[],"collaborators":[],"tags":[],"stageId":"61311f13e0865410f55ca440","stageName":"Ideation","status":"active","isBlocked":false,"stageUpdateTimeInMillis":1659039356912,"statusUpdateTimeInMillis":1659039356912}]'
Note that this example user only has one project, so this is easy to parse, the information we need for the next step is the project ID, "id":"62e2ee7b4423920fe1871d7b".
You may have lots of projects, so you can optionally skip step 1 and just search using your project name. Remember though project names are not necessarily unique across users. So make sure you account for this, but as an example, lets find a project named quick-start...
import os
import requests
headers = {'X-Domino-Api-Key': os.environ['DOMINO_USER_API_KEY']}
url = 'https://mydomino.mycompany.com/v4/projects?name=quick-start'
response_body = requests.get(url, headers=headers).content
print (response_body)
The response to this should look the same as the response above...
b'[{"id":"62e2ee7b4423920fe1871d7b","name":"quick-start","description":"This is a sample Domino Project. This project contains examples for using notebooks, publishing models as APIs, and publishing Python/Flask and R/Shiny web applications.","visibility":"Private","ownerId":"62e2ee7a4423920fe1871d79","ownerUsername":"user1","collaboratorIds":[],"collaborators":[],"tags":[],"stageId":"61311f13e0865410f55ca440","stageName":"Ideation","status":"active","isBlocked":false,"stageUpdateTimeInMillis":1659039356912,"statusUpdateTimeInMillis":1659039356912}]'
Again the important thing we need is the project ID, "id":"62e2ee7b4423920fe1871d7b".
3. Great, we now have a project ID! Lets find the dataset IDs using the project ID we found in step 2, 62e2ee7b4423920fe1871d7b....
import os
import requests
headers = {'X-Domino-Api-Key': os.environ['DOMINO_USER_API_KEY']}
url = 'https://mydomino.mycompany.com/dataset?projectId=62e2ee7b4423920fe1871d7b'
response_body = requests.get(url, headers=headers).content
print (response_body)
b'[{"datasetName":"quick-start","datasetId":"62e2ee7f4423920fe1871d81","projectId":"62e2ee7b44
23920fe1871d7b"}]'
Once again this user only has one dataset, you may have several in your project so you'll need to parse for the correct one. But we now have a dataset ID, 62e2ee7f4423920fe1871d81.
So my value for the datasetId is, 62e2ee7f4423920fe1871d81.
How to add the new snapshot
1. Execute the add snapshot API call.
import os
import requests
headers = {'X-Domino-Api-Key': os.environ['DOMINO_USER_API_KEY'],'Content-Type': 'application/json'}
data = {'datasetId': '62e2ee7f4423920fe1871d81', 'relativeFilePaths': ["/Snapshot1"]}
url = 'https://mydomino.mycompany.com/v4/datasetrw/snapshot'
response_body = requests.post(url, headers=headers, json=data).content
print (response_body)
Which returns...
b'{"id":"62e2fe654423920fe1871d93","resourceId":"f38e6ea3-d582-4f31-86a9-09977f03f188","datasetId":"62e2ee7f4423920fe1871d81","author":"62e2ee7a4423920fe1871d79","version":1,"creationTime":1659043429800,"lifecycleStatus":"Pending","statusLastUpdatedBy":"62e2ee7a4423920fe1871d79","statusLastUpdatedTime":1659043429800,"storageSize":0,"isPartialSize":false,"isReadWrite":false}'
Checking the dataset in the project UI, you should hopefully find something similar to below...
Note that this data is static, the dataset ID will not change, so once it's found you do not need to find it again. The relative path will only change if you change it of course.
Notes/Information:
Public REST API calls for Domino versions lower than 5.2 are documented here:
https://dominodatalab.github.io/api-docs/
Note that for these older versions, code examples are available for the specific API calls, but python is only available for Python2. Additionally the individual API calls often require data, IDs, etc that are only available from other API calls.
For version 5.2 and beyond look here for public REST API calls:
https://docs.dominodatalab.com/en/latest/api_guide/8c929e/domino-public-apis/
Comments
0 comments
Please sign in to leave a comment.