Troubleshoot and Debug the C3 AI Model Inference Service and VllmPipe
A C3 AI cluster with application(s) requiring the usage of large language models (LLMs), vision language models (VLMs), embedding models, or other large models may require a Model Inference Service to host and serve those models.
The C3 AI Model Inference Service (MIS) is a C3 Agentic AI Platform Microservice for low latency serving of machine learning (ML) models, including LLMs. With C3 AI MIS, you can host any MlAtomicPipe from the C3 AI Model Registry for a "warm" deployment and manage routing of all inference requests.
This topic addresses how to troubleshoot issues and debug errors when using the C3 AI MIS, including the following preliminary sanity check steps to verify before more extensive debugging efforts.
Overview of sanity check steps
As an initial step in troubleshooting, consider verifying the C3 AI MIS is set up correctly by reviewing the sanity check items below, as well as whether the model has warmed up adequately for full functionality to be realized. The following sections detail how to verify the following stages of the setup process were completed correctly:
Setup and configuration of the C3 AI MIS - These sanity check steps include verifying the connection to the client application, the routes for the C3 AI MIS, and the configuration of the C3 AI Model Registry Service.
Model serving setup and pipe warmup - These sanity check steps include verifying the model is warmed up and ready to serve the LLMs.
Model files downloaded correctly - These sanity check steps include verifying the model files are downloaded correctly and completely during the chunk and upload process.
If the sanity check items are verified, see the "Troubleshoot error messages" section below.
Verify correct setup and configuration of the C3 AI MIS
As initial sanity checks, verify that the C3 AI MIS is setup and configured correctly.
Verify connection to client application and routes for C3 AI MIS
To verify that the client application is correctly configured to use the C3 AI MIS instance, confirm the following:
Verify that
ModelInference.config()from the client application points to the expected C3 AI MIS instance. If this does not work as expected, see the "Connect your client application(s) to C3 AI MIS" section in the Create and Configure the C3 AI MIS topic.Verify that
ModelInference.listRoutes()lists the expected routes and does not output an error message. If unexpected routes are listed or an error message results, see Manage Routes of the C3 AI MIS to Change or Upgrade LLMs.
Verify C3 AI Model Registry Service configuration
To verify that the C3 AI Model Registry Service is configured correctly for use with the C3 AI MIS, confirm the following:
Verify that
ModelRegistry.config()points to the correct C3 AI Model Registry Service application. If this does not work as expected, see the section in the Create and Configure the C3 AI MIS topic.Verify that
ModelRegistry.list()lists the expected results and does not output an error message. If this does not list the expected results, see the Create and Deploy a VllmPipe topic.See also the C3 AI Model Registry - Tutorial.
Verify model is warmed up and ready to serve LLM
The model must warmup before using the C3 AI MIS for LLM text generation or other actions. After deploying a model in the C3 AI MIS, you can monitor the status by checking that an action for warmupModel is running. This action downloads the model files to the C3 node and loads the model into the GPU memory.
To check the status, run the following code snippet.
c3Grid(Action.dump())If the status indicates the warmupModel action is not complete, expand the output for more information.
If errors occur during warmup, see the "Debug pipe warmup errors" section below.
Verify the model files are downloaded
During the chunk and upload process, there are two (2) file transfer actions that occur:
- Copy the files into the leader node (c3fs).
- Upload of the chunked files into cloud storage.
To monitor the progress of these actions, use FileSystem.listFiles. See the example code snippets below.
// Files in C3FS
c3Grid(C3FileSystem.listFiles('c3--datasets/genai/models/'))
// Chunked Files in GCS
c3Grid(FileSystem.listFiles('gcs://c3--datasets/genai/models/code_narwhal_20231207_chunked')Overview of troubleshooting error messages
This section provides additional debugging paths for error messages that are observed during the pipe warmup stage of the C3 AI MIS setup and configuration processes, as well as errors received during the inference request stage.
Debug pipe warmup errors
The follow section provides details for errors received during the pipe warmup stage, and potential troubleshooting steps to resolve the issue.
Error message - The number of required GPUs exceeds the total number of available number of GPUs in the cluster
This error is caused by the VllmPipe requesting more GPUs than are available on the HarwareProfile for the App.NodePool on which it is deployed.
To resolve, set tensorParallelSize to the number of GPUs.
See also Monitor and Scale the C3 AI Model Inference Service.
Error message - <Path> does not appear to have a file named config.json
This error message indicates an issue with the model files and are generally the result of the following:
- Incorrect path given
- Nested directories
- Model files not chunked
- Files not downloaded properly to the GPU node
See the following for more information identify and resolve these issues.
Incorrect path given
Verify that the model files exist in the path specified in the error message. Verify that the proper prefix is listed (for example, gcs:// rather than gs://).
To resolve, create a new pipe with the proper modelUrl.
See also Create and Deploy a VllmPipe.
Nested directories
The vLLM expects the path to be at the lowest-level directory that contains the model files. If the path contains additional layers of nesting, it will fail and cause the error.
To resolve, set modelUrl to the lowest-level directory that contains the config.json and other model files (for example, gcs://c3--datasets/path_to_narwal/nested/directory).
See also Create and Deploy a VllmPipe.
Files not downloaded properly to the GPU node
This occurs if there is an interruption to the download that causes some, but not all, of the model files to be copied to the node. Running the following in the pod to confirm whether all the expected files are present:
kubectl exec -it <pod-name> -- bashOr by calling the following command on the correct C3 node:
Os.commandWithArgs('ls', ['<path>'])It's possible to use Server.callJson method for calling this command on C3 node. For example:
var server = Server.forId('<id>')
server.callJson('Os', 'commandWithArgs', null, ['ls', ['<path>']])To resolve, remove the local directory and all previously downloaded files that are specified in the error message using kubectl exec or Os.commandWithArgs. Then, complete the warmupModel action again.
See also Create and Deploy a VllmPipe.
Error message - CUDAError: Out of Memory
This error might occur during pipe warmup if you try to deploy multiple models using GPUs to the same nodepool.
To resolve, do the following:
Terminate one of the deployments using
ModelInference.terminate(). See Manage Routes of the C3 AI MIS to Change or Upgrade LLMs for details.Create a new App.NodePool with the required resources (such as, GPU and memory). See the Create and Deploy a VllmPipe for more information.
Deploy the model to the new App.NodePool. See the Monitor and Scale the C3 AI MIS for more information.
See also Configure and Manage Nodepools.
Debug inference request errors
The follow section provides details for errors received during the inference request stage, and potential troubleshooting steps to resolve the issue.
Call to ModelInference.completion() API hangs
If the ModelInference.completion() API hangs when you call it, verify that the model is warmed up or whether the warmupModel action is in progress or complete.
See the "Verify correct setup and configuration of the C3 AI MIS" section above.
See also Use C3 AI MIS for LLM Text Generation for additional details about inputs and configurations for the ModelInference.completion() API.
See also
- Microservice
- Overview of C3 AI Model Inference Service Administration
- Create and Configure the C3 AI Model Inference Service
- Create and Deploy a VllmPipe
- Manage Routes of the C3 AI Model Inference Service to Change or Upgrade LLMs
- Monitor and Scale the C3 AI Model Inference Service
- Use C3 AI Model Inference Service for LLM Text Generation and Inference Requests