It's not currently possible to refresh a Dataset in a running session (or to mount an existing Snapshot after the session starts). Only the Snapshots specified when you launch the session will be available. Since Snapshots are read-only once created, and not accessible across sessions until then, they are not really suited for truly streaming data workflows.
If new files are not actually added very often, but you want to have a long-running script A "watching" for them to do some processing, you might find the Domino API useful. Using the API you can check the list of Dataset Snapshots, and if there is a new one, launch a separate Job via an API call in your code to do the processing on the new snapshot.
Comments
3 comments
Hi jws383,
The files in the latest snapshot from a dataset that is attached to your project should be jut available to your python script in the /domino/datasets/ folder. Let me know if I'm misunderstanding the workflow you are asking about here.
Dan.
Submitted by: dan.stern
Hi Dan,
I realize I didn't explain the context very well. In script A, I want to be able to refresh the dataset to get the most current snapshot with a function call in the script. Another script B will be adding files to the dataset script A is accessing in real-time, so in script A, I want to be able to check if any new files were added by script B before doing more processing.
Jacob
Submitted by: jws383
Hi Jacob,
It's not currently possible to refresh a Dataset in a running session (or to mount an existing Snapshot after the session starts). Only the Snapshots specified when you launch the session will be available. Since Snapshots are read-only once created, and not accessible across sessions until then, they are not really suited for truly streaming data workflows. But there might be some things you can do:
If you are operating within interactive Workspaces you might find the scratch space useful (https://docs.dominodatalab.com/en/4.2/reference/data/datasets/Datasets_Scratch_Spaces.html).
Or, if new files are not actually added very often, but you want to have a long-running script A "watching" for them to do some processing, you might find the Domino API useful (https://docs.dominodatalab.com/en/4.2/api/Domino_API.html). Using the API you can check the list of Dataset Snapshots, and if there is a new one, launch a separate Job via an API call in your code to do the processing on the new snapshot.
Let us know if that answers things for you!
Melanie
Submitted by: melanie.veale
Please sign in to leave a comment.