If a Model API is making use of multi threading you can follow the below steps to manually enable this property by updating the container spec of the deployment object. This however is a temporary fix, meaning if the model is stopped completely the change would need to be reapplied. Rest assured, as long as the app is alive this change will function even if the model app is scaled up or down.
How to update the spec ?
Run the kubectl edit deployment on the 'version' of the running model (eg : kubectl edit deployment/myapp-deployment)
Add the below sed command under spec/containers dict:
- /bin/sed -i 's/workers\ 1/workers\ 1 --enable-threads/' /domino/model-manager/harness;
The env: line in a fresh/new Model API deployment has a dash '-' at the beginning which needs to be removed, since the stanza now starts at '- args:'
The second occurrence of worker in the sed line will be changed to your desired number of workers, 2 in this instance:
- /bin/sed -i 's/workers\ 1/workers\ 2 --enable-threads/' /domino/model-manager/harness;
Note: By enabling more workers, the HTTP/HTTPS query to the Model will allow more concurrent queries, however the resource usage might not suit the instance type and you may need to upgrade the instance to more CPU's and/or more RAM.