Environment management
Overview
Environment management is the practice of creating new Domino environments and editing existing environments to meet your specific language and package needs. This work is typically done by an administrator or advanced Domino user.
Here are some examples of when to create or modify an environment:
- You need to install a package for Python, R, Octave, or some other software dependency.
- You use a library that takes a long time to install, and you’d prefer to cache that package into the environment so that it's always immediately available.
- You are managing an organization, and want to create a default environment for your team across all projects.
Domino uses Docker for environments. An environment is a Domino abstraction on top of a Docker image, that provides additional flexibility and versioning. When Domino starts your run, it creates a Docker container based on the environment associated with your project. Each run takes place in an isolated Docker container.
Managing environments
The environment your project will use is set from the project's Settings page.
The dropdown allows you to select the environment that runs inside this project will use. You'll see global environments plus any environments you own or that have been shared with an organization you belong to.
Click Manage Environments to open the environments overview. You'll see the environments you have access to, including your deployment's global environments, environments in use by projects you are a collaborator on, and environments shared with organizations you are a member of.
You can create a new environment by clicking Create Environment at top right. You'll be asked to name your new environment and define its visibility. Administrators will see a third option to have the new environment be available globally (to all users of the deployment).
After creating your environment, you will be taken to the environment detail page, where you can define the Dockerfile and supporting scripts and settings for the environment.
Environment actions
-
Edit Definition
Takes you to a page where you can edit all of your environment's attributes. -
Duplicate Environment
Clones your environment. -
Archive Environment
Hides your environment. Projects already using this environmentwill be allowed to continue to use it until a new project environmentis set.
Overview tab
The overview tab shows all metadata about your environment including the following attributes. Click Edit Environment in the top right to go into edit mode to make changes to your environment. After each save, your environment's revision number will be incremented by one and your Domino deployment will rebuild the environment and push it to the local docker registry.
Revisions tab
The revisions tab shows a list of all revisions of your compute environment along with each revision's build status, timestamp, and docker image URI. You can click the gear icon to reveal additional options including the ability to view build logs, cancel builds, or set a revision as Active.
Projects, Data Sets, and Models tabs
Once the environment has been assigned to a project, data set or model, you will be able to see a list of those entities on their tab. This is useful for seeing who you need to contact if you update or want to archive an environment, for example.
Environment attributes
Base Environment
Gives you a choice between basing your compute environment on your deployment default or on a custom Dockerfile URI (e.g. registry.hub.docker.com/library/python:3.8-slim). This defines the FROM line in the Dockerfile Domino constructs for you.
Dockerfile Instructions
Enter your Dockerfile layers here. Docker's official site has a handy guide here. You can also read our primer on Dockerfiles below.
Pluggable Notebooks / Workspace Sessions
Define which interactive tools should be available in a project using this environment. See this for more details.
Scripts
Here you can input lines of bash code which will be executed at the specified step in your experiment's lifecycle. These commands are run as root and are executed at runtime.
-
Pre-setup scripts
Run before the Python packages in your project’s requirements.txt are installed.
-
Post-setup scripts
Run after the requirements.txt installation process.
-
Pre-run scripts
Run right after post-setup scripts
-
Post-run scripts
Run at the beginning of the Stopping run state. Due to the way Domino handles shutting down runs and workspaces, these scripts are subject to a runtime limit, which defaults to 1 hour. If the script hasn't completed by the end of the limit, it will timeout and the process will terminate. The timeout duration is configurable. Contact your local administrator or mail to support@dominodatalab.com for assistance in modifying timeouts.
Docker Arguments
Here, admins can specify arguments that will be passed to the underlying docker run command. Arguments must be separated by newlines. In almost all cases, you shouldn't need to modify this.
Username
Admins can specify a non-default username for your environment here.
Environment variables
You can set environment variables at the environment level.
Raw Dockerfiles
You may wish to install packages directly to your environment. This can come in handy if your package installation takes a long time. Installations in a Domino environment are cached, so you won’t have to wait for it every time. The Domino platform uses Docker containers to manage isolated environments. If you already have a Docker image you'd like to use, you can specify it in the preceding Base Environment field. If you don't set this, we will use the Domino default environment as your base image. Consult the official Docker documentation to learn more about Dockerfiles:
Note that Domino takes care of the FROM line for you, pointing to the base image specified when setting up the environment. Do not start your Dockerfile instructions in Domino with a FROM line.
The most common Dockerfile instructions you'll use are RUN, ENV, and ARG:
RUN commands execute lines of bash, for example:
RUN wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz RUN tar xvzf spark-1.5.1-bin-hadoop2.6.tgz RUN mv spark-1.5.1-bin-hadoop2.6 /opt RUN rm spark-1.5.1-bin-hadoop2.6.tgz
ARG commands set build-time variables, and ENV commands set container bash environment variables. They will be accessible from runs that use this environment. For example:
ENV SPARK_HOME /opt/spark-1.5.1-bin-hadoop2.6
If you set environment variables as part of the Environment variables section of your environment definition, you need to specify the variable name only with an ARG statement:
ARG SPARK_HOME
This will be available for the build step. If you want the variable to be available in the final compute environment you also need to add an ENV statement referencing the argument name:
ENV SPARK_HOME=$SPARK_HOME
Examples: Package Installation
You can click the R Package or Python Package buttons when editing your environment, and these will insert a line with the correct syntax to install packages (just fill in the names of the packages you want). Or you can add the commands yourself, following these examples:
- R Package Installation: Example with the devtools package.
RUN R --no-save -e "install.packages('devtools')"
-
Python Package Installation with Pip: Example with the numpy package.
RUN pip install numpy
Dockerfile best practices
-
Docker optimizes its build process by keeping track of commands it has run and aggressively caching the results. This means that if it sees the same set of commands as a previous build, it will assume it can use the cached version. A single new command will invalidate the caching of all subsequent commands.
-
There is a limit to the number of layers (that is, commands) a docker image can have. Currently, this limit is 127. Keep in mind that the image upon which you are building may have already used many layers. One way to work around this limit is to combine several commands into one via &&, like this:
RUN \ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz && \ tar xvzf spark-1.5.1-bin-hadoop2.6.tgz && \ mv spark-1.5.1-bin-hadoop2.6 /opt && \ rm spark-1.5.1-bin-hadoop2.6.tgz
-
If you are installing multiple python packages via pip, it's almost always best to use a single pip install command. This ensures that dependencies and package versions are properly resolved. If you install via separate commands, you may end up inadvertently overriding a package with the wrong version, due a dependency specified by a later installation. For example:
RUN pip install luigi nolearn lasagne
Comments
0 comments
Please sign in to leave a comment.