It's not currently possible to refresh a Dataset in a running session (or to mount an existing Snapshot after the session starts). Only the Snapshots specified when you launch the session will be available. Since Snapshots are read-only once created, and not accessible across sessions until then, they are not really suited for truly streaming data workflows. But there might be some things you can do:
If you are operating within interactive Workspaces you might find the scratch space useful.
Or, if new files are not actually added very often, but you want to have a long-running script A "watching" for them to do some processing, you might find the Domino API useful. Using the API you can check the list of Dataset Snapshots, and if there is a new one, launch a separate Job via an API call in your code to do the processing on the new snapshot.
Comments
3 comments
Hi Jacob,
It's not currently possible to refresh a Dataset in a running session (or to mount an existing Snapshot after the session starts). Only the Snapshots specified when you launch the session will be available. Since Snapshots are read-only once created, and not accessible across sessions until then, they are not really suited for truly streaming data workflows. But there might be some things you can do:
If you are operating within interactive Workspaces you might find the scratch space useful (https://docs.dominodatalab.com/en/4.2/reference/data/datasets/Datasets_Scratch_Spaces.html).
Or, if new files are not actually added very often, but you want to have a long-running script A "watching" for them to do some processing, you might find the Domino API useful (https://docs.dominodatalab.com/en/4.2/api/Domino_API.html). Using the API you can check the list of Dataset Snapshots, and if there is a new one, launch a separate Job via an API call in your code to do the processing on the new snapshot.
Let us know if that answers things for you!
Melanie
Submitted by: melanie.veale
Hi jws383,
The files in the latest snapshot from a dataset that is attached to your project should be jut available to your python script in the /domino/datasets/ folder. Let me know if I'm misunderstanding the workflow you are asking about here.
Dan.
Submitted by: dan.stern
Hi Dan,
I realize I didn't explain the context very well. In script A, I want to be able to refresh the dataset to get the most current snapshot with a function call in the script. Another script B will be adding files to the dataset script A is accessing in real-time, so in script A, I want to be able to check if any new files were added by script B before doing more processing.
Jacob
Submitted by: jws383
Please sign in to leave a comment.