Datasets, refresh and scheduled refresh.

Follow

Comments

2 comments

  • Jaclyn Patterson

    Hi,

    Thanks for your questions on datasets.

    To schedule a new snapshot of a dataset, you can follow the below steps:

    • Create a file named domino.yaml in your project with the below configuration. Set the name of the configuration as the name of your dataset. Below is the configuration for a dataset in my project. The format of the value for path: is user-name/project-name/dataset-name
    datasetConfigurations:
     - name: "sales"
       inputs:
         - path: "input/"
           dataset: "akshay_ambekar/product-insight/sales"
       outputs:
         - path: "output/"
           dataset: "akshay_ambekar/product-insight/sales"
    

    The path: specifies the path at which the input and output dataset will be mounted. The value of this key is always appended to /domino/datasets/ . This configuration tells the code that the latest snapshot of your input dataset sales will be mounted at /domino/datasets/input and the data that you write to /domino/datasets/output will be saved as a new snapshot of the sales dataset.

    • Next, in your code file, make sure that you write the required data that you want to save as a snapshot to /domino/datasets/outputFor instance, if you want to write a file sales_q4_2019.csv as a new snapshot of the sales dataset, you can do so by saving the file in this way: file = open("/domino/datasets/output/sales_q4_2019.csv","w")
    • Finally, on the screen where you schedule a new job (Scheduled Jobs), provide the command, hardware tier, and then expand the datasets dropdown, click on Advanced and select the dataset configuration (sales) that you defined in the domino.yaml file, set schedule options and hit schedule. All subsequent runs of your scheduled job should create a new snapshot of the dataset.

    As for your second question, yes, you will have to rerun the app.

    Submitted by: akshay.ambekar

    0
    Comment actions Permalink
  • Jaclyn Patterson

    @puggelli_vladimiro To expand on the second part of your question, about republishing the app: you can also schedule the publishing of your new app version using the app publishing API (see https://community.dominodatalab.com/discussion/comment/91#Comment_91). If you would like, you could even wrap all of these operations into one scheduled parent job that first creates the new snapshot, optionally waits some time, and then publishes the new app version. Or, you can schedule two back-to-back jobs, one that creates the new snapshot and another after it that publishes a new app version. Let us know if you have any further questions!

    Submitted by: katie.shakman

    0
    Comment actions Permalink

Please sign in to leave a comment.