Introduction
According to their documentation, “Conda is an open source package management system and environment management system that quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments. It was created for Python programs, but it can package and distribute software for any language”.
Customers often would like to create Conda virtual environments from an existing yaml definition. This is convenient because the Domino Standard Environment comes with Conda pre-installed, so all we need to do is add the yaml file and run the command to create a new environment with it. It’s also convenient for the customer because it makes their Domino workspaces feel familiar because they have all the same packages– even the same virtual environment name– as in their existing IDEs.
Let’s take a look at how to create Conda virtual environments in Domino compute environments and some of the challenges around them.
A Brief Intro to Conda Environments
Conda environments can be created with a simple
conda create --name myenv
and a package can be added to that environment at the same time like this:
conda create -n myenv scipy
You can even specify the Python version in the environment:
conda create -n myenv python=3.9
However, installing more than a few packages in an environment could get labour-intensive and it becomes tricky to know what’s in different environments. Fortunately, Conda environments can be created from a yaml definition, which makes it quicker to create them, easier to compare them, and their definitions can be committed to Git for safe-keeping, version control, etc.. How to construct that yaml file is a topic in its own right, and there is some limited documentation here.
However, broadly, yaml definitions have a name field, a list of channels to obtain packages from, and dependencies, which are the desired packages themselves, pinned to specific versions if you want.
Example:
name: stats2
channels:
- javascript
dependencies:
- python=3.9
- bokeh=2.4.2
- numpy=1.21.*
- nodejs=16.13.*
- flask
- pip
- pip:
- Flask-Testing
You can then create an environment with:
conda env create -f <path to your yaml definition>
and activate it with:
conda activate <your environment name>
You can then check the installed packages in that environment with conda list to confirm that everything was created correctly.
In Domino Compute Environments
Depending on your customer’s requirements, you can either create a Conda environment within a Docker image or create it when the workspace starts up.
At Workspace Startup
You can create an environment when a workspace starts up by saving a yaml definition file in the Domino project and then adding the following lines to the workspace’s pre-run script:
conda env create -f /mnt/environment.yml
conda init bash
source ~/.bashrc
This will create the environment and then prepare Terminal to be able to interact with it.
Once in the workspace, the environment can be activated and tested through Terminal like this:
(base) $ python --version
Python 3.8.12
(base) $ conda activate stats2
(stats2) $ python --version
Python 3.9.12
(stats2) $ conda deactivate
(base) $
A benefit of creating an environment when the workspace starts is that the latest versions of any unpinned packages will be installed. It also means that you can use the same base image for all projects and then define unique Conda environments per project.
The trade-offs to this are that workspace startup times could be slow if the environment is complex. It also increases the chance that one project might use different packages to another, similar project. Although the environment configuration is documented in the project, there would be no way to automatically alert that one project had drifted away from the other.
Another issue with this method is that it assumes that the customer’s Domino instance has access to a Conda repository. This isn’t always the case, because Domino instances are sometimes isolated from the Internet without an internal Conda mirror.
Within a Docker Image
If the above method doesn’t work for any of the above reasons, you can create an environment in a Docker image that can be pulled into the customer’s Docker registry.
The following method assumes that you have a local folder called conda, containing the environment definition and a text file called conda-init.txt (more on that in a minute)
FROM quay.io/domino/standard-environment:ubuntu18-py3.8-r4.1-domino5.0
COPY conda /tmp
RUN cd /tmp && \
conda env create -f environment.yml && \
echo "auto_activate_base: false" >> /home/ubuntu/.condarc && \
cat /tmp/conda-init.txt >> /home/ubuntu/.bashrc && \
echo "conda activate stats2" >> /home/ubuntu/.bashrc
As you can see, Docker copies the contents of the conda folder from the local machine into /tmp on the Docker image, then creates the environment from the yaml definition file in /tmp. The following step stops Conda from automatically activating the default base environment.
In testing, it was found that running conda init in Docker didn't work, so the conda-init.txt file contains the code that is appended to your bash profile if you run conda init bash from Terminal. We then simulate the process of running conda init by manually appending that code into ~/.bashrc.
The contents of conda-init.txt look like:
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/conda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/opt/conda/etc/profile.d/conda.sh" ]; then
. "/opt/conda/etc/profile.d/conda.sh"
else
export PATH="/opt/conda/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
Finally, the last step in the Docker script automatically activates our new Conda environment when Terminal starts up in our workspace.
Conda Environments in IDEs
Unfortunately, the above configuration only configures Terminal to be able to switch Conda environments; Jupyter and Jupyterlab will be stuck using the default ipykernel.
To be able to switch between Conda environments, you need to add this block either to your Dockerfile instructions, either in the Domino compute environment, or in the base image:
USER root
RUN conda install nb_conda
With nb_conda installed, you will then have the option in Jupyter and Jupyterlab to switch Conda environments. In Jupyterlab, it looks something like this:
Conda Environments for Jobs, Apps and Models
Apps
Apps are called from a shell script, but for some reason they don’t seem to honour any active Conda environment set in your bash profile. That being the case, you can add a line to your app.sh script like this before running your Python script:
#!/usr/bin/env bash
source activate stats2
python /mnt/foo.py
To be sure, you could also add a conda env list before and/or after source activate to show what the environment was and what it changes to.
Models
Models are trickier because they run in whatever the default Conda environment is.
Option 1: Use bash to switch environments
One way to switch Conda environments from within Python, the trick seems to be to use os.system() or subprocess.run() to change the environment source activate <environment> in bash. However, that only sets the Conda environment for the duration of that call and then it reverts to the previous environment, so you would need to nest your actual Python script inside this call like this.
import subprocess
subprocess.run('source activate <environment> && python /mnt/foo.py', shell=True, executable='/bin/bash')
or
import os
os.system('SCRIPT="source activate <environment> && python /mnt/foo.py"; /bin/bash -c "$SCRIPT"')
The only snag after that is that model APIs in Domino are defined by giving the names of a script and a function in it. As the examples above only call the script, not the function in it, you would need to create a function in your script that imports the function you actually want like this:
import subprocess
from model import *
def call_model(start, stop):
subprocess.run('source activate lgimapy37 && python -c "from model import *; my_model(start, stop)"', shell=True, executable='/bin/bash')
If you put this into a wrapper Python script, you should be able to run your model in whatever Conda environment you like.
A small caveat with this is that if, for some reason, you try to activate the environment you’re already in (say, you’re in stats2 and you try to run source activate stats2 from your script), these scripts seem to fail– if you try to import a package that only exists in that environment, Python won’t find it.
If you switch to another environment and back again, the scripts work, but then you might as well skip the whole process of activating an environment at all.
Another issue with this way to call the API is that subprocess.run() and os.system() don’t return anything from the Python function. Obviously, for a synchronous API, this isn’t any use, but it might possibly be ok if you are running an asynchronous API where you don’t need a response besides knowing that your request was submitted successfully.
Option 2: Use the base Conda environment
An alternative to using bash to switch Conda environments is to use base instead. There are a couple of ways you could do this:
- In an existing DSE image, uninstall all of the existing packages from base with:
conda list -n base -r
and then update the environment with a list of packages from an environment.yml file.
This is risky because we install things like jupyterlab and jupyter-server-proxy in base in the DSE, and removing them might break workspaces. Users would want to compare the contents of base (conda list -n base) with what they have in their environment.yml to make sure they wouldn’t be missing anything. Even then, it might need a process of trial-and-error before they were happy that they had all the Python packages they need.
- Create a new base image for your model APIs with just one virtual environment.
Particularly for model APIs, why even use the DSE/DME? It comes with IDEs that you can’t use, and almost certainly a number of packages you don’t need, which only make the image bigger and slower to push and pull. Why not create a base image for model APIs with a base Conda environment with only the packages the user needs?
The trade-off with this is that users would need to be careful to keep the contents of their virtual environments in this image in sync with the image they use for workspaces. This could be a tricky management problem– although you could easily schedule a job to rebuild both images from the same environment.yml file, the challenge might be communicating to other users that the contents of the images had changed and could potentially have broken someone’s code.
Conclusion/TL;DR
There doesn’t seem to be one neat solution to running Conda virtual environments in Domino.
In Jupyter(lab) workspaces, the nb_conda plugin will allow you to jump between virtual environments by clicking on the relevant tile, but that’s no use in non-interactive workloads.
In apps, you can modify your app.sh script to activate the relevant environment before you run your Python script.
Similarly, in model APIs you could write a wrapper Python script that would activate your environment via bash, but this might only work with asynchronous APIs, as it is hard to return a response from your API.
The one solution that seems to work in all cases is simply to use the base Conda environment, but if you can only use one environment, it defeats the main purpose of virtual environments.
This might all change as Conda develops, but at the moment, in short, virtual environments in containers are less than ideal.
Comments
0 comments
Article is closed for comments.