Versions: 5.2.0 and higher using image builder v3
Issue: When building an environment the build logs are present however disappear after some time.
This issue will also apply for model API builds.
Root cause: This problem usually indicates an issue with the long term storage for logs. In Domino platform the build logs are initially stored in service called "Redis". As the nature of redis is a memory buffer storage it allows for much faster access to the logs. However eventually logs are cleared from redis and only remain on disk filesystem for long term storage. The "vector" container is responsible for writing the files to the filesystem.
Troubleshooting: To identify potential issues with 'vector' we will have to inspect 2 points. First we will check the logs for vector and then we will also need to check the configuration .
The container for "vector" runs as a sidecar within the hepaestus-manager pod. To obtain logs from this container you can run:
kubectl logs -n <DOMINO_PLATFORM_NAMESPACE> hephaestus-manager-<ID> -c vector
The log for vector is very detailed and if logs are received from the container you will see each individual entry in the log along with the various parameters. For example:
{"event":{"file":"/var/log/hephaestus/output.json","host":"hephaestus-manager-75476cc9f8-kjkqk",
"message":"{\"level\":\"info\",\"ts\":\"2023-02-09T09:30:28.701692482Z\",\"logger\":\
"controller.imagebuild.component.build-dispatcher.buildkit\",\"msg\":\"#10 DONE 5.9s\\n\",\
"imagebuild\":\"domino-compute/domino-build-63e4bd037a2347203c5844a5\",\
"addr\":\"tcp://hephaestus-buildkit-0.hephaestus-buildkit.domino-compute:1234\",\
"logKey\":\"5a44-build-63e4bd037a2347203c5844a5\"}","source_type":"file",
"timestamp":"2023-02-09T09:30:29.043601751Z"},"log":"#10 DONE 5.9s\n",
"logKey":"5a44-build-63e4bd037a2347203c5844a5","stream":"stdout","time":"2023-02-09T09:30:28.701692482Z",
"time_nano":"1675935028701692482"}
As we can see here it has received a message from hephaestus with a timestamp and text "#10 DONE 5.9s" . This is the message it will proceed and try to write to the filesystem.
Another important element from this log is the "logKey" as this value will be used to construct the path to the file it will attempt to flush the log in to.
If there are issues with vector you can expect to see a message like this:
2023-02-09T09:30:29.045818Z ERROR sink{component_kind="sink" component_id=outfile
component_type=file component_name=outfile}: vector::internal_events::file:
Unable to open the file. error=Permission denied (os error 13)
error_code=failed_opening_file error_type="io_failed" stage="sending"
As we can see here it is coming back with an error "Permission denied" . This log however does not show the exact path it is getting this permission problem from. To identify this you will have to inspect the vector configuration. The configuration is part of the hephaastus secret under a key called vector.yaml and this is base64 encoded .
To get the configuration you can run:
kubectl get secret -n <DOMINO_PLATFORM_NAMESPACE> hephaestus-config -o yaml
Locate the line starting with vector.yaml and decode the value by parsing this to "base64 -d" .
The result will be a yaml format configuration for vector in which under the "sinks:" you will find "domino_logs". From there we want to get the "path:" value which instructs vector where logs are written.
Example:
...
sinks:
outfile:
compression: gzip
encoding:
codec: ndjson
inputs:
- domino_logs
path: /dominoblobs/logjam/{{ logKey }}/log
type: file
....
As we can see here the expected path that is getting the permission denied error will be constructed as: /dominoblobs/logjam/(logKey_value_from_the_log)/log . Check the path and make sure it is writable and accessible.
Comments
0 comments
Please sign in to leave a comment.