Model deployment configuration
Domino model APIs are scalable, high-availability REST services. The Deployment tab of the model settings page allows you to configure three important things for your model:
- The compute resources available to your model hosts
- The number of model hosts serving your model
- The number of routes -- or versions -- you want to expose
Scaling your model
There are two dimensions on which to scale your model.
Horizontal scaleYou can select the number of model hosts that you want running atany given time. Domino will automatically load-balance requests tothe model endpoint between these hosts. A minimum of 2 instancesallows you to have a high-availability model and is the defaultselection. Domino supports up to 32 instances per model.
Vertical scaleYou can choose a hardware tier that will determine the amount ofRAM and CPU resources available to each model host.
When you change either of these selections, your model will be restarted with the new settings.
Routing your model
Domino supports two routing modes.
In this mode, you only have one route exposed that always points to the latest successfully deployed model version. When you deploy a new one, the old one is shut down and replaced with the new one while maintaining availability. The route has the following signature:
In this mode, you can have two running versions - a promoted version and a latest version. This allows you to have a workflow where your clients always point to the promoted version and you can test with the latest. When the latest version is ready for production, you can seamlessly switch it to be the promoted version with no downtime. The routes have the following signature:
Please sign in to leave a comment.