Connecting to Impala from Domino
Overview
This article describes how to connect to Apache Impala from Domino.
Apache Impala is an open source massively parallel processing SQL query engine for data stored in a computer cluster running Apache Hadoop.
Using Impala ODBC Connector for Cloudera Enterprise with pyodbc
Domino recommends using the Impala ODBC Connector for Cloudera Enterprise in concert with the pyodbc library for interacting with Impala from Python.
Environment setup
Visit the Cloudera downloads page to download the Impala ODBC Connector for Cloudera Enterprise to your local machine. For default Domino images of Ubuntu 16.04, you should download the 64-bit Debian package. Keep track of where you save this file, as you will need it in a later step.
Create a new public project in your Domino instance to host the driver files for use in Domino environments.
In the new project, click browse for files and select the driver file you downloaded earlier to queue it for upload. Click Upload to add it to the project.
After the driver file has been added to your project files, click the gear next to it in the files list, then right click Download and click Copy link address. Save this address somewhere and keep it handy, as you will need when setting up your environment.
Add the below Dockerfile instructions below to install the driver and pyodbc in your environment, pasting in the URL you copied earlier where indicated on line 5.
# download the driver from your project RUN mkdir /ref_files RUN \ cd /ref_files && \ wget --no-check-certificate [paste-download-url-from-previous-step-here] && \ gzip -d clouderaimpalaodbc_2.6.0.1000-2_amd64.deb.gz # install the driver RUN gdebi /ref_files/clouderaimpalaodbc_2.6.0.1000-2_amd64.deb --n # update odbc.ini file for impala driver RUN \ echo "\n\ [Cloudera ODBC Driver for Impala] \n \ Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so \n \ KrbFQDN=_HOST \n \ KrbServiceName=impala \n" >> /etc/odbcinst.ini # set up impala libraries RUN export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cloudera/impalaodbc/lib/64 RUN ldd /opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so # install pyodbc RUN pip install pyodbc
For a basic introduction to modifying Domino environments, watch this tutorial video.
Credential setup
There are several environment variables you should set up to store secure information about your Impala connection. Set the following as Domino environment variables on your user account:
IMPALA_HOST
Hostname where your Impala service is running. Make sure your Impala service and network firewall are configured to accept connections from Domino.
IMPALA_PORT
The port your Impala service is configured to accept connections on.
IMPALA_KERB_HOST
Hostname of your Kerberos authentication service.
IMPALA_KERB_REALM
The name of the Kerberos realm used by the Impala service.
Read Environment variables for secure credential storage to learn more about Domino environment variables.
Usage
Read the pyodbc documentation for detailed information on how to use the package to interact with a database. Below are some example for how to set up a connection.
import pyodbc import os # fetch values from environment variables hostname = os.environ['IMPALA_HOST'] service_port = os.environ['IMPALA_PORT'] kerb_host = os.environ['IMPALA_KERB_HOST'] kerb_realm = os.environ['IMPALA_KERB_REALM'] # create connection object conn = pyodbc.connect('Host=hostname;' +'DRIVER={Cloudera ODBC Driver for Impala};' +'PORT=service_port;' +'KrbRealm=kerb_realm;' +'KrbFQDN=kerb_host;' +'KrbServiceName=impala;' +'AUTHMECH=1',autocommit=True) # if you see: # 'Error! Filename not specified' # while querying Impala using the connection object, # add the following configuration line: # # conn.setencoding(encoding='utf-8', ctype=pyodbc.SQL_CHAR) # if your Impala uses SSL, add SSL=1 to the connection string # conn = pyodbc.connect('Host=hostname;' # +'DRIVER={Cloudera ODBC Driver for Impala};' # +'PORT=service_port;' # +'KrbRealm=kerb_realm;' # +'KrbFQDN=kerb_host;' # +'KrbServiceName=impala;' # +'AUTHMECH=1;' # +'SSL=1;' # +'AllowSelfSignedServerCert=1', autocommit=True)
Comments
0 comments
Please sign in to leave a comment.