Troubleshoot and Debug the C3 AI Model Inference Service and VllmPipe

A C3 AI cluster with application(s) requiring the usage of large language models (LLMs), vision language models (VLMs), embedding models, or other large models may require a Model Inference Service to host and serve those models.

The C3 AI Model Inference Service (MIS) is a C3 Agentic AI Platform Microservice for low latency serving of machine learning (ML) models, including LLMs. With C3 AI MIS, you can host any MlAtomicPipe from the C3 AI Model Registry for a "warm" deployment and manage routing of all inference requests.

This topic addresses how to troubleshoot issues and debug errors when using the C3 AI MIS, including the following preliminary sanity check steps to verify before more extensive debugging efforts.

Overview of sanity check steps

As an initial step in troubleshooting, consider verifying the C3 AI MIS is set up correctly by reviewing the sanity check items below, as well as whether the model has warmed up adequately for full functionality to be realized. The following sections detail how to verify the following stages of the setup process were completed correctly:

Setup and configuration of the C3 AI MIS - These sanity check steps include verifying the connection to the client application, the routes for the C3 AI MIS, and the configuration of the C3 AI Model Registry Service.
Model serving setup and pipe warmup - These sanity check steps include verifying the model is warmed up and ready to serve the LLMs.
Model files downloaded correctly - These sanity check steps include verifying the model files are downloaded correctly and completely during the chunk and upload process.

If the sanity check items are verified, see the "Troubleshoot error messages" section below.

Verify correct setup and configuration of the C3 AI MIS

As initial sanity checks, verify that the C3 AI MIS is setup and configured correctly.

Verify connection to client application and routes for C3 AI MIS

To verify that the client application is correctly configured to use the C3 AI MIS instance, confirm the following:

Verify that ModelInference.config() from the client application points to the expected C3 AI MIS instance. If this does not work as expected, see the "Connect your client application(s) to C3 AI MIS" section in the Create and Configure the C3 AI MIS topic.
Verify that ModelInference.listRoutes() lists the expected routes and does not output an error message. If unexpected routes are listed or an error message results, see Manage Routes of the C3 AI MIS to Change or Upgrade LLMs.

Verify C3 AI Model Registry Service configuration

To verify that the C3 AI Model Registry Service is configured correctly for use with the C3 AI MIS, confirm the following:

Verify that ModelRegistry.config() points to the correct C3 AI Model Registry Service application. If this does not work as expected, see the section in the Create and Configure the C3 AI MIS topic.
Verify that ModelRegistry.list() lists the expected results and does not output an error message. If this does not list the expected results, see the Create and Deploy a VllmPipe topic.
See also the C3 AI Model Registry - Tutorial.

Verify model is warmed up and ready to serve LLM

The model must warmup before using the C3 AI MIS for LLM text generation or other actions. After deploying a model in the C3 AI MIS, you can monitor the status by checking that an action for warmupModel is running. This action downloads the model files to the C3 node and loads the model into the GPU memory.

To check the status, run the following code snippet.

JavaScript

c3Grid(Action.dump())

If the status indicates the warmupModel action is not complete, expand the output for more information.

If errors occur during warmup, see the "Debug pipe warmup errors" section below.

Verify the model files are downloaded

During the chunk and upload process, there are two (2) file transfer actions that occur:

Copy the files into the leader node (c3fs).
Upload of the chunked files into cloud storage.

To monitor the progress of these actions, use FileSystem.listFiles. See the example code snippets below.

JavaScript

// Files in C3FS
c3Grid(C3FileSystem.listFiles('c3--datasets/genai/models/'))

// Chunked Files in GCS
c3Grid(FileSystem.listFiles('gcs://c3--datasets/genai/models/code_narwhal_20231207_chunked')

Overview of troubleshooting error messages

This section provides additional debugging paths for error messages that are observed during the pipe warmup stage of the C3 AI MIS setup and configuration processes, as well as errors received during the inference request stage.

Debug pipe warmup errors

The follow section provides details for errors received during the pipe warmup stage, and potential troubleshooting steps to resolve the issue.

Error message - The number of required GPUs exceeds the total number of available number of GPUs in the cluster

This error is caused by the VllmPipe requesting more GPUs than are available on the HarwareProfile for the App.NodePool on which it is deployed.

To resolve, set tensorParallelSize to the number of GPUs.

Error message - <Path> does not appear to have a file named config.json

This error message indicates an issue with the model files and are generally the result of the following:

Incorrect path given
Nested directories
Model files not chunked
Files not downloaded properly to the GPU node

See the following for more information identify and resolve these issues.

Incorrect path given

Verify that the model files exist in the path specified in the error message. Verify that the proper prefix is listed (for example, gcs:// rather than gs://).

To resolve, create a new pipe with the proper modelUrl.

Nested directories

The vLLM expects the path to be at the lowest-level directory that contains the model files. If the path contains additional layers of nesting, it will fail and cause the error.

To resolve, set modelUrl to the lowest-level directory that contains the config.json and other model files (for example, gcs://c3--datasets/path_to_narwal/nested/directory).

Files not downloaded properly to the GPU node

This occurs if there is an interruption to the download that causes some, but not all, of the model files to be copied to the node. Running the following in the pod to confirm whether all the expected files are present:

Command Line

kubectl exec -it <pod-name> -- bash

Or by calling the following command on the correct C3 node:

JavaScript

Os.commandWithArgs('ls', ['<path>'])

It's possible to use Server.callJson method for calling this command on C3 node. For example:

JavaScript

var server = Server.forId('<id>')
server.callJson('Os', 'commandWithArgs', null, ['ls', ['<path>']])

To resolve, remove the local directory and all previously downloaded files that are specified in the error message using kubectl exec or Os.commandWithArgs. Then, complete the warmupModel action again.

Error message - CUDAError: Out of Memory

This error might occur during pipe warmup if you try to deploy multiple models using GPUs to the same nodepool.

To resolve, do the following:

Terminate one of the deployments using ModelInference.terminate(). See Manage Routes of the C3 AI MIS to Change or Upgrade LLMs for details.
Create a new App.NodePool with the required resources (such as, GPU and memory). See the Create and Deploy a VllmPipe for more information.
Deploy the model to the new App.NodePool. See the Monitor and Scale the C3 AI MIS for more information.

Debug inference request errors

The follow section provides details for errors received during the inference request stage, and potential troubleshooting steps to resolve the issue.

Call to `ModelInference.completion()` API hangs

If the ModelInference.completion() API hangs when you call it, verify that the model is warmed up or whether the warmupModel action is in progress or complete.

See the "Verify correct setup and configuration of the C3 AI MIS" section above.

See also Use C3 AI MIS for LLM Text Generation for additional details about inputs and configurations for the ModelInference.completion() API.

Copy link to this sectionOverview of sanity check steps

Copy link to this sectionVerify correct setup and configuration of the C3 AI MIS

Copy link to this sectionVerify connection to client application and routes for C3 AI MIS

Copy link to this sectionVerify C3 AI Model Registry Service configuration

Copy link to this sectionVerify model is warmed up and ready to serve LLM

Copy link to this sectionVerify the model files are downloaded

Copy link to this sectionOverview of troubleshooting error messages

Copy link to this sectionDebug pipe warmup errors

Copy link to this sectionError message - The number of required GPUs exceeds the total number of available number of GPUs in the cluster

Copy link to this sectionError message - <Path> does not appear to have a file named config.json

Copy link to this sectionIncorrect path given

Copy link to this sectionNested directories

Copy link to this sectionFiles not downloaded properly to the GPU node

Copy link to this sectionError message - CUDAError: Out of Memory

Copy link to this sectionDebug inference request errors

Copy link to this sectionCall to ModelInference.completion() API hangs

Copy link to this sectionSee also

Overview of sanity check steps

Verify correct setup and configuration of the C3 AI MIS

Verify connection to client application and routes for C3 AI MIS

Verify C3 AI Model Registry Service configuration

Verify model is warmed up and ready to serve LLM

Verify the model files are downloaded

Overview of troubleshooting error messages

Debug pipe warmup errors

Error message - The number of required GPUs exceeds the total number of available number of GPUs in the cluster

Error message - <Path> does not appear to have a file named config.json

Incorrect path given

Nested directories

Files not downloaded properly to the GPU node

Error message - CUDAError: Out of Memory

Debug inference request errors

Call to `ModelInference.completion()` API hangs

See also