C3 AI Documentation Home

Manage Routes of the C3 AI Model Inference Service to Change or Upgrade LLMs

A C3 AI cluster with application(s) requiring the usage of large language models (LLMs), vision language models (VLMs), embedding models, or other large models may require a Model Inference Service to host and serve those models.

The C3 AI Model Inference Service (MIS) is a C3 Agentic AI Platform Microservice for low latency serving of machine learning (ML) models, including LLMs. With C3 AI MIS, you can host any MlAtomicPipe from the C3 AI Model Registry for a "warm" deployment and manage routing of all inference requests.

This topic addresses how to change or upgrade LLMs and deploy a new VllmPipe, change routes, and terminate outdated deployments.

Overview of managing routes of the C3 AI MIS

It may become necessary to change/upgrade an LLM you are using. This can be done by deploying a new VllmPipe (if not already deployed) and then changing the route to use the new deployment instead of the old one. In this section, we demonstrate an example.

NOTE: Routes can be managed only from the C3 AI Model Inference service application.

Change a route and upgrade/change LLMs

Let's assume that we are replacing the Falcon-40B model we deployed above with a new Falcon-40B model. Let's also assume that we've registered a VllmPipe for that model with the same URI "falcon40b" and created a new App.NodePool named "4xa100falcon80gNew" for the deployment. We can run the following code to retrieve the second version of the registered pipe from C3 AI Model Registry Service for that URI:

Python
# Python
vers = c3.ModelRegistry.listVersions(None, filter="contains(uri, 'falcon40b/2')").objs
entry = vers[0]

or

JavaScript
// JavaScript
vers = ModelRegistry.listVersions(None, {filter: "contains(uri, 'falcon40b/2')"}).objs
entry = vers[0]

We deploy the new entry to the App.NodePool we created.

Python
# Python
c3.ModelInference.deploy(entry, node_pools=["4xa100falcon80gNew"])

or

JavaScript
// JavaScript
ModelInference.deploy(entry, {nodePools: ["4xa100falcon80gNew"]})

And now that we've retrieved and deployed the latest entry, we can simply change the route we were using before ("falcon40b") to use this new entry instead of the old one, thereby upgrading the route to use our newer Falcon-40B LLM.

Python
# Python
c3.ModelInference.setRoute(pipeEntry=entry, route="qa-model-f40b", overwrite=True)

or

JavaScript
// JavaScript
ModelInference.setRoute(entry, "qa-model-f40b", true)

Please note that the C3 AI MIS is not checking if the new deployment is ready to serve requests. It's recommended to check the deployment state before modifying the route.

Terminate a deployment

If we no longer need the old Falcon-40B deployment, we can terminate it:

Python
# Python
# Terminate the engine
c3.ModelInference.terminate(entry, confirm=True)

# Terminate the {@link App.NodePool}
c3.app().nodePool('4xa100falcon80g').terminate()
c3.App.NodePool.Config.forName('4xa100falcon80g').clearConfigAndSecretAllOverrides()

or

JavaScript
// JavaScript
// Terminate the engine
ModelInference.terminate(entry, {confirm: true})

// Terminate the {@link App.NodePool}
C3.app().nodePool('4xa100falcon80g').terminate()
App.NodePool.Config.forName('4xa100falcon80g').clearConfigAndSecretAllOverrides()

See also

Was this page helpful?