Applies to version: 4.x and 5.x which are still using Image Builder V2 instead of of V3 (Hephaestus)
Problem:
Sometimes it appears that a build will get stuck in the queued status. This can occur if the Nucleus services lose certain integrations with our message queue system (RabbitMQ).
Background:
When Compute Environment builds are first triggered the build is queued for scheduling while kubernetes tries to find a node to house the forge container which will build the image. The queued status is depicted by the + symbol:
RabbitMQ contains a few queues that Nucleus-dispatcher consumes.
containerimagebuilds.export-status-update 0 2
containerimagebuilds.status-update 0 3
If the consumers get lost due an infrastructure problem or something defective then your build might actually complete without the UI ever receiving the message that it should display the checkmark "completed" symbol. The product is improving in a future version to prevent this scenario.
Resolution:
1. Determine whether your build completed:
Search for your build pod, determine if it is actively "Running" or "Init", rather than having a finished state of "Completed" or "Error".
kubectl get po -A | grep build
(hit "Build logs" link within your revision to find the build id which will be reflected in the pod's name. The build id is the last id found in the url, like domino.tech/environments/622235b49d56014aa6fc22ff/revisions/622235d59d56014aa6fc2302/build/622235d59d56014aa6fc2304/logs )
2. If it is still "Init", then describe the pod for meaningful messages and consider contacting technical support if it never progresses to "Running". If it is Running, then wait til it progresses to Completed or Error.
3. If it is "Completed" or "Error" and your UI still isn't reflecting proper status, then check your Rabbit queue:
Do an exec into a rabbitmq pod and run "rabbitmqctl list_queues name messages consumers | grep container" to determine if there are zero consumers. status-update is applicable to this kb and it should be at least '2' as seen in the far right column. This output shows no messages waiting for consumption by the two consumers:
4. If the right-hand column is 0, then you are witnessing this problem and you should restart Dispatcher server.
In Admin -> Advanced -> Restart services. This could cause interesting "state" representations for workspaces for between thirty seconds and two minutes but otherwise won't be a noticeable impact on end-users.
Test a new build.
Comments
0 comments
Please sign in to leave a comment.