Creating a SAS Data Science Workspace Environment
In this guide, we will walk through building a SAS Data Science Docker container image that will be integrated with Domino Data Lab. Before we dive in, we will answer a few common questions and provide additional resources.
Before getting started, you will need a few things. Configuring and getting ready for these are outside of the scope of this guide.
Although you are not required to run the completed SAS Data Science container image on a Domino Data Lab environment that has Internet access, you will need Internet access to download the appropriate tools that are used to build the SAS Data Science container image.
We will be using a Docker CLI client to build the SAS Data Science container image. Although all of the commands shown can be copy/pasted, it is good to have some familiarity with the Docker CLI tools.
A Docker Registry will be used to store the final SAS Data Science image before it can be consumed in Domino. There many options for Docker Registry providers and software. If you do not feel comfortable with setting up a Docker Registry to store the Docker images for your Domino Data Lab environment, please contact your Domino Customer Success Manager (CSM) or Technical Account Manager (TAM).
We will be checking out a SAS Container Recipes Git repository. Although there are other ways to download this repository from the Internet, Git CLI will be used in this guide.
SAS Data Science License
This installation does require that you have a valid SAS Data Science license, which is provided to you by SAS Institute Inc. As part of the license, you should have a file called SAS_Viya_deployment_data.zip that will contain all of your license information and will be used to download the appropriate software.
Comfort with Linux Command-Line Utilities
All of the instructions in this guide are written for Red Hat Enterprise Linux variants. The instructions are primarily for CentOS 7, but can easily be adapted to support Red Hat Enterprise Linux, SuSE Enterprise Linux, or Oracle Linux, which are all supported by the SAS Data Science platform.
Please see the following page for Linux 64-bit operating systems that SAS Data Science (Viya family) supports: SAS Supported Operating Systems.
Creating a SAS Data Science Docker Image
The instructions for building the base SAS Data Science image that we follow are based on the SAS Container Recipes, which is available on the GitHub webpage below. Please consult the directions in the following GitHub repository for exact instructions for your situation: SAS Container Recipes.
In this guide, we will be building a SAS Data Science image with a CentOS 7 base. This will be a single Viya container instead of the full-blown Viya platform across multiple containers.
The general build instructions are as follows:
- Clone the GitHub repository for SAS Container Recipes
|1||git clone https://github.com/sassoftware/sas-container-recipes.git|
|3||cp PATHTO/SAS_Viya_deployment_data.zip .|
Replace PATHTO above with the directory that contains your SAS_Viya_deployment_data.zip file.
- Build the SAS Viya image using the build.sh utility provided
|Shell Command (cont)|
|4||./build.sh --base-image centos --base-tag 7 --type single --zip ./SAS_Viya_deployment_data.zip|
At the end of this process, you should have a SAS Data Science Docker image locally.
If you run into any issues, please contact your SAS Institute Inc. representative for support in resolving the issues.
Adding Additional Licensed SAS Software
Although it is outside the scope of this document, if you require installing any additional components like SAS/ACCESS modules or database drivers, please consult with your SAS representatives. These additional components can be layered on top of your base SAS Data Science Docker image.
Integrating the SAS Data Science Docker Image with Domino
We will now switch over to the Domino GitHub repository for the SAS Data Science image build. The Domino repository contains all of the files necessary to finalize the build of the SAS Data Science container image to make it integrated with Domino.
Please follow the README instructions on the Domino repository for more information about the individual files.
These are the steps you will need to follow to complete the build process:
- Clone the Domino GitHub repository
|1||git clone https://github.com/imarchenko/sas-data-science.git|
- Modify the Dockerfile's FROM instruction to use the SAS Data Science image you built in the prior steps
|Shell Command (cont)|
|4||sed -Ei.bak "s#SASDS_DOCKER_TAG#$SASDS_DOCKER_TAG#g" Dockerfile|
Please change NAME:TAG above to the Docker image tag that was created in the Creating a SAS Data Science Docker Image step.
- Build the Docker image
|Shell Command (cont)|
|6||docker build . -t $DOMINO_SASDS_DOCKER_TAG|
Please change NAME:TAG above to your final Docker Registry image name and tag. This is the Docker image that will be later used inside of a Domino Compute Environment.
Testing the Docker Image Locally
Before pushing the Docker image to your Docker Registry, it is a good idea to test it locally first. There are two modes to test:
Interactive (SAS Studio)
|1||docker run -p 80:8888 -u domino:domino -w /mnt -v $PWD/tests:/mnt -it $DOMINO_SASDS_DOCKER_TAG /var/opt/workspaces/sasds/start|
After a couple of minutes when you launch the interactive SAS Studio, you should see a message "SAS Studio is now running". This is when you can visit http://localhost/SASStudio/start.html in your web browser to test SAS Studio.
|2||docker run -u domino:domino -w /mnt -v $PWD/tests:/mnt -it $DOMINO_SASDS_DOCKER_TAG run_sas.sh $SAS_BATCH_PROGRAM|
Please change PROGRAM.SAS above with your test SAS program.
Push the Domino-Integrated SAS Data Science Docker Image to a Docker Registry
The final step is to push the Domino-integrated SAS Data Science Docker image to a Docker Registry. This Docker Registry will be later used to pull the Docker image into your Domino Data Lab environment.
|1||docker push $DOMINO_SASDS_DOCKER_TAG|
Replace NAME:TAG with the Docker Registry tag you used in the Integrating the SAS Data Science Docker Image with Domino step.
Please work with your Domino Data Lab technical account team on the best method to pull the Docker image into your Domino Data Lab environment.
Configuring the SAS Data Science Compute Environment in Domino
Congratulations, you are near the end of the installation process. The last step is to configure your Compute Environment in your Domino Data Lab environment.
- In your Domino Data Lab environment, navigate to the Domino Compute Environments page and create a new Compute Environment
- Set the "Custom Image" location to your Docker Registry image. For the Custom Image URL, use the Docker Registry image URL that you created in the Push the Domino-Integrated SAS Data Science Docker Image to a Docker Registry step.
- Create a Pluggable Workspace for SAS Studio in your Compute Environment
|2||title: "SAS Data Science"|
|4||start: [ "/var/opt/workspaces/sasds/start" ]|
- When you are done defining the Pluggable Workspace, click the Build button at the bottom of the Compute Environment page to finalize your SAS Data Science configuration for Domino Data Lab
Maintenance and License Updates
The easiest way to keep your SAS Data Science updated is to repeat the steps in this guide whenever a new release of SAS Data Science is available. The same process should be repeated when you need to update a license file during renewals.
Repeating this process will ensure that you are staying current with the latest version of the SAS Data Science software.
SAS Studio Timeout
By default SAS Studio will log the user out after 30 minutes - so no further development can be done in that session, and changes not written to the filesystem cannot be saved.
The recommendation is to set timeout to a high value e.g. 24 hours.
In the SAS Data Science Compute Environment in Domino, set the following in the Dockerfile:
This is a Spring Boot 2.0 property rather than a Studio property; use the 'm' for specifying minutes or an interval alone for seconds.
NB: this will be baked into the initial image build in future releases.
SAS Studio Tabs Lost after Session Timeout
To prevent tabs being lost after losing connection, configure the following option in Preferences.
Configuring ODBC connections
Ensure that the LD_LIBRARY_PATH is set first, before individual ODBC libraries, as per the example below:
ERROR: Failed to load the Apache Parquet support extension
Errors can be generated when trying to read Parquet files if the LD_LIBRARY_PATH has not been set correctly: please see Configuring ODBC connections above.