Submitted originally by: puggelli_vladimiro
Hi all,
I have two questions:
- How to schedule a new snapshot of a dataset? If I run a scheduled job, the job do not find the dataset file.
- I use a dataset on an app, but I notice, if a create a new snapshot, that I need to rerun the app to update the file. Is correct? Some workaround?
Thanks for reply.
Comments
2 comments
Hi,
Thanks for your questions on datasets.
To schedule a new snapshot of a dataset, you can follow the below steps:
The
path:
specifies the path at which the input and output dataset will be mounted. The value of this key is always appended to/domino/datasets/
. This configuration tells the code that the latest snapshot of your input datasetsales
will be mounted at/domino/datasets/input
and the data that you write to/domino/datasets/output
will be saved as a new snapshot of thesales
dataset./domino/datasets/output
For instance, if you want to write a filesales_q4_2019.csv
as a new snapshot of the sales dataset, you can do so by saving the file in this way:file = open("/domino/datasets/output/sales_q4_2019.csv","w")
domino.yaml
file, set schedule options and hit schedule. All subsequent runs of your scheduled job should create a new snapshot of the dataset.As for your second question, yes, you will have to rerun the app.
Submitted by: akshay.ambekar
@puggelli_vladimiro To expand on the second part of your question, about republishing the app: you can also schedule the publishing of your new app version using the app publishing API (see https://community.dominodatalab.com/discussion/comment/91#Comment_91). If you would like, you could even wrap all of these operations into one scheduled parent job that first creates the new snapshot, optionally waits some time, and then publishes the new app version. Or, you can schedule two back-to-back jobs, one that creates the new snapshot and another after it that publishes a new app version. Let us know if you have any further questions!
Submitted by: katie.shakman
Please sign in to leave a comment.