Submitted originally by: puggelli_vladimiro
I have two questions:
- How to schedule a new snapshot of a dataset? If I run a scheduled job, the job do not find the dataset file.
- I use a dataset on an app, but I notice, if a create a new snapshot, that I need to rerun the app to update the file. Is correct? Some workaround?
Thanks for reply.
Thanks for your questions on datasets.
To schedule a new snapshot of a dataset, you can follow the below steps:
path:specifies the path at which the input and output dataset will be mounted. The value of this key is always appended to
/domino/datasets/. This configuration tells the code that the latest snapshot of your input dataset
saleswill be mounted at
/domino/datasets/inputand the data that you write to
/domino/datasets/outputwill be saved as a new snapshot of the
/domino/datasets/outputFor instance, if you want to write a file
sales_q4_2019.csvas a new snapshot of the sales dataset, you can do so by saving the file in this way:
file = open("/domino/datasets/output/sales_q4_2019.csv","w")
domino.yamlfile, set schedule options and hit schedule. All subsequent runs of your scheduled job should create a new snapshot of the dataset.
As for your second question, yes, you will have to rerun the app.
Submitted by: akshay.ambekar
@puggelli_vladimiro To expand on the second part of your question, about republishing the app: you can also schedule the publishing of your new app version using the app publishing API (see https://community.dominodatalab.com/discussion/comment/91#Comment_91). If you would like, you could even wrap all of these operations into one scheduled parent job that first creates the new snapshot, optionally waits some time, and then publishes the new app version. Or, you can schedule two back-to-back jobs, one that creates the new snapshot and another after it that publishes a new app version. Let us know if you have any further questions!
Submitted by: katie.shakman
Please sign in to leave a comment.