Version/Environment (if relevant):
This applies to all versions of Domino.
In a Domino Spark cluster, you're using Kerberos to authenticate to the SQL server using PySpark. When adding the Kerberos to Domino spark cluster, you get this Kerberos authentication error:
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) (12.34.567.89 executor 0): com.microsoft.sqlserver.jdbc.SQLServerException: Integrated authentication failed. ClientConnectionId:1234d56a-78c9-1234-beda-eedc1234b567
You've set the Spark configuration on the code level as well as on the Spark Web UI's Project Integration page. You can successfully run
kinit in the Spark driver (workspace) and see the token via
klist. So you're able to validate
klist in the workspace logs.
I have no name!@spark-123456cf7c89123b456fa789-spark-worker-2:/tmp$ klist
Ticket cache: FILE:/tmp/krb5cc_12345
Default principal: sys-data@AD.COMPANY.COM
Valid starting Expires Service principal
03/29/23 15:28:44 03/29/23 23:28:44 krbtgt/AD.COMPANY.COM@AD.COMPANY.COM
renew until 04/03/23 15:28:44
However, you're unable to view the
klist on the executor Spark workers pods. When you run
klist in the executor and/or the workspace, it'll always look for the default location e.g.
/tmp/krb5cc_12345 and you run into the
klist: No credentials cache found error:
I have no name!@spark-1234567f1c23456b789faed1-spark-worker-0:/opt/bitnami/spark$ klist
klist: No credentials cache found (filename: /tmp/krb5cc_12345)
- When setting environment variables,
spark.executorEnv.ENVNAMEwill not work. You need to use
ENVin the cluster environment to set them.
- When setting multiple
extraJavaOptionsparameters, having one line per
extraJavaOptionswill only set the last parameter.
Set environment variables within the executor cluster environment's Dockerfile using:
- This successfully changes the default
ccachedirectory to a shared directory (that’s accessible by both driver and executor) so that executors can read the
ccachepreviously generated from the driver during pre-run.
- Trying to change it within the
spark.executorEnv.KRB5CCNAMEoption or anywhere else will not work. When you try to validate by logging into the executor and performing
klistin the executor and/or the workspace, it'll always look for the default location
ENV KRB5_CONFIG=/mnt/data/workspace_name/krb5.confinstead of
Pre Run Script
- You need to run
kinitIn the Spark driver's environment’s Pre Run script to get the token and create the cached token on the dataset so that executors can rely on it for authenticating.
Under Project Settings > Integrations > Apache Spark mode > Spark Configuration Options, when setting multiple
extraJavaOptions, having one line per parameter will not work:
This is because during runtime, only the one last option (
-Dsun.security.krb5.debug=true in the above example) will be seen in the Spark WebUI environment variables under Spark Web UI > Spark Properties, which highlights that it was overwriting the previous three options.
You need to change runtime code to concatenate multiple
spark.driver.extraJavaOptions parameters into one line with each additional option separated by spaces:
spark_conf.set("spark.executor.extraJavaOptions","-Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.auth.login.config=/mnt/data/workspace_name/SQLJDBCDriver2.conf -Djava.security.krb5.conf=/mnt/data/workspace_name/krb5.conf -Dsun.security.krb5.debug=true")
- Apache Spark 3.0: By setting
ccachein Spark’s configuration, the local Kerberos ticket cache will be used for authentication. Spark will keep the ticket renewed during its renewable life, but after it expires a new ticket needs to be acquired (e.g. by running
kinit). It’s up to the user to maintain an updated ticket cache that Spark can use. The location of the ticket cache can be customized by setting the
- Spark on Domino > Configure Prerequisites: You must configure the PySpark compute environments for workspaces and/or jobs that will connect to your cluster.
- Domino supports Kerberos Authentication including Keytab file based authentication, allowing users to authenticate as themselves when connecting to Kerberos-secured systems. Users can enable Kerberos authentication at the project-level or user-level by uploading a Kerberos keytab and principal into Domino. After set up, Runs started by Kerberos-enabled users or in Kerberos-enabled projects in Domino will automatically run kinit and retrieve the ticket to be able to authenticate.