Environment Sizing

Leader & task node sizing

C3 Generative AI requires at least one task node for asynchronous processing. The application has a significant memory footprint, especially in Python. Each leader and task node designated for indexing should meet the following specifications:

Memory: At least 30 GB, 50% allocated to JVM (for nodes with larger memory capacities, a smaller JVM allocation, such as 30%, may be sufficient)
Disk Space: Over 100 GB, to support Python runtime installation
CPU: Minimum of 5 cores

Run the below commands to configure the recommended application infrastructure for a standard deployment:

JavaScript

Genai.QuickStart.setupLeader();
Genai.QuickStart.setupTask();

Note: The node pools in the script may not be optimal for large-scale data or production. See Multiple Leader Nodes for managing a high number of concurrent users and GPU Nodes for handling large-scale data.

GPU nodes

While not mandatory, configuring a task node with GPU is recommended for improved performance, particularly for datasets exceeding 40,000 passages (~500 MB). Leveraging GPU can accelerate indexing by 10x to 100x, with additional performance benefits when multiple GPUs are configured.

Requirements for indexing using pgvector

GPU node memory (GB)	Max passages
32	2 million
64	14 million

Requirements for additional features

GPU nodes are also highly beneficial for features that perform inference on models loaded into local memory, including LLM guardrails, corroboration, and multi-modal processing. For memory-intensive features such as multi-modal processing, a 64 GB GPU node is recommended to accommodate the higher memory demands for large documents.

Multiple leader nodes

C3 Generative AI supports multi-leader setups to handle higher concurrent usage.

Number of leader nodes	Max number of supported concurrent users
1	10
5	150

Scaling Nodes

Use a Jupyter notebook to scale leader and task nodes. To open a Jupyter notebook in C3 AI Studio, hover over the application card and select Jupyter.

Autoscaling Nodes

Autoscaling for leader or task nodes can be enabled using App.NodePool#setAutoScaleSpec. Various strategies can be employed for auto scaling. These are documented in subtypes of App.NodePool.AutoScaleStrategy.

To use a custom implementation to define autoscaling, we can use App.NodePool.AutoScaleStrategy.Lambda.

Below is an example of adding a custom lambda for auto scaling of leader nodes.

Python

def auto_scale_strategy(strategy=None):
    node_count = c3.app().nodePool('leader').nodes()
    # implement logic to return count of leader nodes
    """
    if queue_size >= 2 * running_threads and running_threads > 0:
        return node_count + 1
    elif queue_size < running_threads // 2 and running_threads > node_count:
        return node_count - 1
    else:
        return node_count
    """
    return node_count

scaling_lambda = c3.Lambda.fromPyFunc(auto_scale_strategy, actionRequirement='py-jep')
strategy = (c3.App.NodePool.AutoScaleStrategy.Lambda.builder()
            .runIntervalMins(1) # specify how frequently should this lambda run
            .nodePoolName('leader')
            .scalingLambda(scaling_lambda)
            .maxPctChangeForScaleUp(100)
            .maxPctChangeForScaleDown(100)
            .build())
c3.app().nodePool('leader').setAutoScaleSpec(strategy=strategy).update()

Manually scaling nodes

Step 1: Inspect current configuration

Check the current configuration for the node pools:

Python

c3.app().nodePool('task').config()
c3.app().nodePool('leader').config()

These commands return current values for minNodeCount, maxNodeCount, and targetNodeCount.

Step 2: Modify node counts

Set the minimum, maximum, and target number of nodes:

Python

c3.app().nodePool('task').setNodeCount(2, 2, 2)
c3.app().nodePool('leader').setNodeCount(2, 2, 2)

The format is: setNodeCount(minCount, maxCount, targetCount).

Ensure that minNodeCount, maxNodeCount, and targetNodeCount are all set to the same value.

Step 3: Apply the changes

Python

c3.app().nodePool('task').update()
c3.app().nodePool('leader').update()

Step 4: Verify node status

Use the following command to verify the current node status:

Python

task_nodes = c3.app().nodePool('task').nodes()
len(task_nodes)

leader_nodes = c3.app().nodePool('leader').nodes()
len(leader_nodes)

Concurrent query handling

The number of concurrent queries that can be handled in parallel is determined by the number of leader nodes. This is defined below, where Q is the number of concurrent queries and L is the number of leader nodes.

Q = 5 \times L

PostgreSQL sizing

For most use cases, C3 Generative AI imposes minimal database load, with the exception of document indexing tasks, which utilize the pgvector PostgreSQL extension. Standard platform PostgreSQL is sufficient unless indexing more than 1 million passages. For further pgvector sizing information, refer to the refer to vector store documentation.

The application also executes database queries to retrieve structured data, which could increase the database load when handling large-scale data.

Cassandra sizing

Cassandra isn't required unless integrating C3 Generative AI with an app with timeseries data.

Copy link to this sectionLeader & task node sizing

Copy link to this sectionGPU nodes

Copy link to this sectionRequirements for indexing using pgvector

Copy link to this sectionRequirements for additional features

Copy link to this sectionMultiple leader nodes

Copy link to this sectionScaling Nodes

Copy link to this sectionAutoscaling Nodes

Copy link to this sectionManually scaling nodes

Copy link to this sectionStep 1: Inspect current configuration

Copy link to this sectionStep 2: Modify node counts

Copy link to this sectionStep 3: Apply the changes

Copy link to this sectionStep 4: Verify node status

Copy link to this sectionConcurrent query handling

Copy link to this sectionPostgreSQL sizing

Copy link to this sectionCassandra sizing

Leader & task node sizing

GPU nodes

Requirements for indexing using pgvector

Requirements for additional features

Multiple leader nodes

Scaling Nodes

Autoscaling Nodes

Manually scaling nodes

Step 1: Inspect current configuration

Step 2: Modify node counts

Step 3: Apply the changes

Step 4: Verify node status

Concurrent query handling

PostgreSQL sizing

Cassandra sizing