Basics of moving data over a network
Overview
When you start run or workspace in Domino, the software and filesystem context for your code is defined by two things:
- a Domino environment defines the container your run executes in
- your project files are mounted at /mnt in the container
Both of these are stored within Domino itself. Domino maintains a versioned repository of your project files, and caches the latest image built from your environment.
There are several circumstances where you may want to retrieve data from a source outside of Domino:
- when executing code stored in your project files, you may want to retrieve fresh data from an external source for analysis
- when building a new revision of your environment, you may want to retrieve and install new dependencies or different versions of existing dependencies
- when running a Domino workspace, you may want to retrieve either dependencies or fresh data to advance your experimentation
In this article we'll introduce some standard tools for moving data from one filesystem to another. Note that all of these require that you have network access to the computer you're trying to get data from. This can mean accessing a machine over your corporate LAN, or the Internet.
Domino executors run on Linux. All of the tools and examples in this article are presented for use on a Domino-supported Linux operating system like Ubuntu or RHEL. However, these tools will work in any GNU Bash shell, including the macOS terminal.
These methods are suited to retrieving specific files that are hosted at a URL or stored on a filesystem. If you have a relational database or other data source that doesn't serve simple files, you should check our how-to guides on data source connections.
Wget
Wget is a built-in utility for GNU operating systems that can download files from network locations over HTTP, HTTPS, and FTP. Files that you want to retrieve with Wget must be served over one of those protocols at a URL your machine has access to.
Wget is extremely simple to use. Commands take the form:
wget PROTOCOL://URL
If you need to supply the target web server with a basic username and password for authentication, you can use the --user and --password flags. Here's a complete example:
wget --user myUsername --password myPassword HTTPS://web.server.url/path/to/file.csv
Many cloud object stores like Amazon S3 and Azure Blob Storage can be configured to serve files at a URL over the Internet. See the first part of the Get Started (Python) tutorial for an example of retrieving data from S3 with Wget. You can also host files on computers in your local network with web servers like Apache or SimpleHTTPServer.
However, Wget is more limited than curl in terms of supported protocols and authentication schemes.
curl
curl is a tool for making web requests over a wide variety of protocols and with support for many authentication and encryption schemes. curl can be used to query a web server for a standard HTTP response like you would get from Wget, but it can also be used to construct more complex queries for REST APIs.
curl requests can become quite complex when passing in many headers or setting many options, but the basic format is similar to Wget:
curl "PROTOCOL://URL"
For example, you can use curl to query the Domino API itself for data about your Domino deployment. Here's a complete example:
curl --include \ -H "X-Domino-Api-Key: <your-api-key>" \ 'https://<your-domino-url>/v4/gateway/runs/getByBatchId'
You can also use curl to download a file from s3 by using the below code. The assumption here is that your s3 bucket resides in us-west-2 region, but you can change that in the url to make sure it reflects the right region in which your s3 bucket is located.
#!/bin/sh file="<your-file-name>" bucket="<your-bucket-name>" resource="/${bucket}/${file}" contentType="<content-type>" dateValue="`date +'%a, %d %b %Y %H:%M:%S %z'`" stringToSign="GET\n\n${contentType}\n${dateValue}\n${resource}" s3Key=$AWS_ACCESS_KEY_ID s3Secret=$AWS_SECRET_ACCESS_KEY signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${s3Secret} -binary | base64` curl -H "Host: ${bucket}.s3-us-west-2.amazonaws.com" \ -H "Date: ${dateValue}" \ -H "Content-Type: ${contentType}" \ -H "Authorization: AWS ${s3Key}:${signature}" \ https://${bucket}.s3-us-west-2.amazonaws.com/${file}
Comments
0 comments
Please sign in to leave a comment.