As of version 4.6, Domino supports the ability to provision an on-demand Dask cluster within the same Kubernetes infrastructure that runs Domino. Below are some tips for gathering logs and beginning to troubleshoot issues with on-demand Dask in Domino.
These are the platform components responsible for the execution of on-demand Dask workloads with useful commands for checking their status and retrieving logs. Domino support will request status of these components, and the log output when troubleshooting issues with Dask. Checking these components requires access to the kubernetes cluster, and values in angle brackets are variable and may need to be adjusted to actual values in your cluster.
- Domino must be version 4.6 or later
- The compute environment for the workspace, and the compute environment for the Dask cluster must be compatible, with matching versions of Dask.
#workspace compute environment image:
#dask cluster environment image:
DaskCluster Custom Resource Definition:
- The DaskCluster CRD (daskclusters.distributed-compute.dominodatalab.com) must be deployed
- Useful Command:
#check if daskcluster crd exists
kubectl get crd daskclusters.distributed-compute.dominodatalab.com
- An on-demand Dask cluster will be spun up within the existing k8s cluster running Domino.
- Useful Commands:
#check if dask cluster exists
kubectl -n <domino-compute> get dask
#check if dask pods are running
kubectl -n <domino-compute> get pod -l app.kubernetes.io/name=dask
#check recent events of dask cluster
kubectl -n <domino-compute> describe dask <cluster name>
Distributed Compute Operator:
- The Distributed Compute Operator is responsible for managing the lifecycle of all Kubernetes resources of the Dask cluster on behalf of the Domino workload.
- Useful Commands
#check if DCO is running:
kubectl -n <domino-compute> get pod -l app.kubernetes.io/instance=distributed-compute-operator
#check DCO pod for recent events:
kubectl -n <domino-compute> describe pod -l app.kubernetes.io/instance=distributed-compute-operator
#check DCO logging:
kubectl -n <domino-compute> logs --timestamps -l app.kubernetes.io/instance=distributed-compute-operator
Nucleus Dispatcher is responsible for managing the lifecycle of the Dask cluster through the creation and deletion of the DaskCluster custom resource object.
Nucleus Frontend is responsible for the Workspace/Job Launch UI and the Dask Web UI
- Useful Commands:
#get nucleus dispatcher pod names and check if they are running
kubectl -n <domino-platform> get pods| grep nucleus
#check nucleus dispatcher logs
kubectl -n <domino-platform> logs <nucleus-dispatcher-pod-name> --timestamps --tail 5000
#check nucleus frontend logs
kubectl logs -n <domino-platform> <nucleus-frontend-pod-name> -c nucleus-frontend --timestamps --tail 5000
- If the workspace and Dask UI have started, some logs can be accessed through the Dask Web UI
This article is meant to give some insight on components to check and a head start on gathering logs that Domino support will request when troubleshooting issues with on-demand Dask in Domino. For further assistance please open a support ticket with any findings gained from the above.