When attempting to diagnose DMM issues, having a clear picture of the status of the spark master and workers can be useful. The Spark UI can can be helpful. Setting up the Spark UI can be configured and viewed using the instructions below.
How to connect to Spark UI:
Spark is part of the DMM Compute ecosystem. Spark runs on master-worker architecture. Although Spark master always runs as a pod in domino platform namespace, a spark worker is active only when a job(training/prediction/ground-truth) runs.
How to connect with Spark master:
Use the following command
kubectl port-forward -n <platform-namespace> spark3-master-0 8080:8080).Alternatively, run
k9scommand from the cluster and port-forward the
spark-masterto a local port.
open port forwarded local port to the spark master UI) i.e:
Note that Workers, Running Applications and Completed Applications are visible in this UI. If no worker or applications are present, stop the port-forward and port-forward the other spark master pod,
How to connect with Spark worker
Use the kubectl command
kubectl port-forward -n <compute-namespace> spark3-worker-0 8081:8081Alternatively, Port-forward Spark worker to a specified local port and browse the same URL.
localhost:8081to view the worker UI.
The links to
stdouthave hardcoded hostnames so they will not render correctly. To view these logs, replace the hostname with
How to connect with Spark driver
DMM Compute uses spark connector to connect with Spark. So, It becomes extremely important to see spark driver logs in case of troubleshooting. To see spark driver logs,
Use the command
kubectl get pod -n <platform-namespace> -l app.kubernetes.io/name=compute(Alternatively, Select the pod with
compute-xxx-xxx(compute-66c4d46887-xj2rh in this case in K9s).
Port forward port number 4040 of this pod to a local port. Use the kubectl command :
kubectl port-forward -n <platform-namespace> <pod-name-from-previous-command> 4040:4040( in K9s, you need to press
enteron this pod and then press
shift+fto port forward it).
Open the local port to open Spark Driver page which contains information of jobs timeline and Job details. Job description points to the actual code pointer of DMM compute.
How to see Spark Logs
Use the following kubectl commands to fetch the logs from different spark pods.
kubectl logs -n <platform-namespace> spark3-master-0
kubectl logs -n <compute-namespace> spark3-worker-0
kubectl logs -n <platform-namespace> -l app.kubernetes.io/name=compute
k9s, the user can press
lon a specific pod to see the corresponding logs of that specific pod. While Spark configuration-related information can be found in
spark-workerlogs, DMM spark usage troubleshooting information resides in
compute-xx-xx(xx depends on each compute pod) pod. So, please refer to
computelogs for the same.