C3 AI Documentation Home

Environment Sizing

Leader & task node sizing

C3 Generative AI requires at least one task node for asynchronous processing. The application has a significant memory footprint, especially in Python. Each leader and task node designated for indexing should meet the following specifications:

  • Memory: At least 30 GB, 50% allocated to JVM (for nodes with larger memory capacities, a smaller JVM allocation, such as 30%, may be sufficient)
  • Disk Space: Over 100 GB, to support Python runtime installation
  • CPU: Minimum of 5 cores

Run the below commands to configure the recommended application infrastructure for a standard deployment:

JavaScript
Genai.QuickStart.setupLeader();
Genai.QuickStart.setupTask();

Note: The node pools in the script may not be optimal for large-scale data or production. See Multiple Leader Nodes for managing a high number of concurrent users and GPU Nodes for handling large-scale data.

GPU nodes

While not mandatory, configuring a task node with GPU is recommended for improved performance, particularly for datasets exceeding 40,000 passages (~500 MB). Leveraging GPU can accelerate indexing by 10x to 100x, with additional performance benefits when multiple GPUs are configured.

Requirements for indexing using pgvector

GPU node memory (GB)Max passages
322 million
6414 million

Requirements for additional features

GPU nodes are also highly beneficial for features that perform inference on models loaded into local memory, including LLM guardrails, corroboration, and multi-modal processing. For memory-intensive features such as multi-modal processing, a 64 GB GPU node is recommended to accommodate the higher memory demands for large documents.

Multiple leader nodes

C3 Generative AI supports multi-leader setups to handle higher concurrent usage.

Number of leader nodesMax number of supported concurrent users
110
5150

Scaling Nodes

Use a Jupyter notebook to scale leader and task nodes. To open a Jupyter notebook in C3 AI Studio, hover over the application card and select Jupyter.

Autoscaling Nodes

Autoscaling for leader or task nodes can be enabled using App.NodePool#setAutoScaleSpec. Various strategies can be employed for auto scaling. These are documented in subtypes of App.NodePool.AutoScaleStrategy.

To use a custom implementation to define autoscaling, we can use App.NodePool.AutoScaleStrategy.Lambda.

Below is an example of adding a custom lambda for auto scaling of leader nodes.

Python
def auto_scale_strategy(strategy=None):
    node_count = c3.app().nodePool('leader').nodes()
    # implement logic to return count of leader nodes
    """
    if queue_size >= 2 * running_threads and running_threads > 0:
        return node_count + 1
    elif queue_size < running_threads // 2 and running_threads > node_count:
        return node_count - 1
    else:
        return node_count
    """
    return node_count

scaling_lambda = c3.Lambda.fromPyFunc(auto_scale_strategy, actionRequirement='py-jep')
strategy = (c3.App.NodePool.AutoScaleStrategy.Lambda.builder()
            .runIntervalMins(1) # specify how frequently should this lambda run
            .nodePoolName('leader')
            .scalingLambda(scaling_lambda)
            .maxPctChangeForScaleUp(100)
            .maxPctChangeForScaleDown(100)
            .build())
c3.app().nodePool('leader').setAutoScaleSpec(strategy=strategy).update()

Manually scaling nodes

Step 1: Inspect current configuration

Check the current configuration for the node pools:

Python
c3.app().nodePool('task').config()
c3.app().nodePool('leader').config()

These commands return current values for minNodeCount, maxNodeCount, and targetNodeCount.

Step 2: Modify node counts

Set the minimum, maximum, and target number of nodes:

Python
c3.app().nodePool('task').setNodeCount(2, 2, 2)
c3.app().nodePool('leader').setNodeCount(2, 2, 2)

The format is: setNodeCount(minCount, maxCount, targetCount).

Step 3: Apply the changes

Python
c3.app().nodePool('task').update()
c3.app().nodePool('leader').update()

Step 4: Verify node status

Use the following command to verify the current node status:

Python
task_nodes = c3.app().nodePool('task').nodes()
len(task_nodes)

leader_nodes = c3.app().nodePool('leader').nodes()
len(leader_nodes)

Concurrent query handling

The number of concurrent queries that can be handled in parallel is determined by the number of leader nodes. This is defined below, where Q is the number of concurrent queries and L is the number of leader nodes.

Q=5×LQ = 5 \times L

PostgreSQL sizing

For most use cases, C3 Generative AI imposes minimal database load, with the exception of document indexing tasks, which utilize the pgvector PostgreSQL extension. Standard platform PostgreSQL is sufficient unless indexing more than 1 million passages. For further pgvector sizing information, refer to the refer to vector store documentation.

The application also executes database queries to retrieve structured data, which could increase the database load when handling large-scale data.

Cassandra sizing

Cassandra isn't required unless integrating C3 Generative AI with an app with timeseries data.

Was this page helpful?