Environment Sizing

Leader and task node sizing

C3 Generative AI requires at least one task node for asynchronous processing. The application has a significant memory footprint, especially in Python. Each leader and task node designated for indexing should meet the following specifications:

Memory: At least 30 GB, 50% allocated to JVM (for nodes with larger memory capacities, a smaller JVM allocation, such as 30%, may be sufficient)
Disk Space: Over 100 GB, to support Python runtime installation
CPU: Minimum of 5 cores

See Multiple Leader Nodes for managing concurrent users and GPU Nodes for large-scale data.

GPU nodes

While not mandatory, a task node with a GPU improves performance for datasets exceeding 40,000 passages (~500 MB). A GPU can accelerate indexing by 10x to 100x, with additional gains when multiple GPUs are available.

Indexing requirements (pgvector)

GPU node memory (GB)	Max passages
32	2 million
64	14 million

Additional feature requirements

GPU nodes also improve features that perform inference on models loaded into local memory, including LLM guardrails, corroboration, and multimodal processing. For multimodal processing — including Genai.SourceFile.Chunker.Mew3 and Genai.SourceFile.Chunker.MsftX — a 64 GB GPU node is recommended to handle the higher memory demands of large document parsing.

Multiple leader nodes

C3 Generative AI supports multi-leader setups to handle higher concurrent usage.

Number of leader nodes	Max number of supported concurrent users
1	10
5	150

Scale nodes

Use a Jupyter notebook to scale leader and task nodes. To open a Jupyter notebook in C3 AI Studio, hover over the application card and select Jupyter.

Autoscale nodes

Autoscaling for leader or task nodes can be enabled using App.NodePool#setAutoScaleSpec. Various strategies are documented in subtypes of App.NodePool.AutoScaleStrategy.

To define a custom autoscaling implementation, use App.NodePool.AutoScaleStrategy.Lambda.

Below is an example of adding a custom lambda for auto scaling of leader nodes.

Python

def auto_scale_strategy(strategy=None):
    node_count = c3.app().nodePool('leader').nodes()
    # implement logic to return count of leader nodes
    """
    if queue_size >= 2 * running_threads and running_threads > 0:
        return node_count + 1
    elif queue_size < running_threads // 2 and running_threads > node_count:
        return node_count - 1
    else:
        return node_count
    """
    return node_count

scaling_lambda = c3.Lambda.fromPyFunc(auto_scale_strategy, actionRequirement='py-jep')
strategy = (c3.App.NodePool.AutoScaleStrategy.Lambda.builder()
            .runIntervalMins(1) # specify how frequently should this lambda run
            .nodePoolName('leader')
            .scalingLambda(scaling_lambda)
            .maxPctChangeForScaleUp(100)
            .maxPctChangeForScaleDown(100)
            .build())
c3.app().nodePool('leader').setAutoScaleSpec(strategy=strategy).update()

Manually scale nodes

Manually scale leader or task nodes using App.NodePool#setNodeCount. For more information, see Configure and Manage Node Pools.

Check the current configuration for the node pools:
Python
```
c3.app().nodePool('task').config()
c3.app().nodePool('leader').config()
```
These commands return current values for minNodeCount, maxNodeCount, and targetNodeCount.
Set the minimum, maximum, and target number of nodes:
Python
```
c3.app().nodePool('task').setNodeCount(2, 2, 2)
c3.app().nodePool('leader').setNodeCount(2, 2, 2)
```
The format is: setNodeCount(minCount, maxCount, targetCount).
Set minNodeCount, maxNodeCount, and targetNodeCount to the same value.

Apply the changes:

Python

c3.app().nodePool('task').update()
c3.app().nodePool('leader').update()

Verify node status:

Python

task_nodes = c3.app().nodePool('task').nodes()
len(task_nodes)

leader_nodes = c3.app().nodePool('leader').nodes()
len(leader_nodes)

Concurrent query handling

The number of concurrent queries is determined by the number of leader nodes, where Q is the number of concurrent queries and L is the number of leader nodes:

Q = 5 \times L

PostgreSQL sizing

For most use cases, C3 Generative AI imposes minimal database load, with the exception of document indexing tasks, which use the pgvector PostgreSQL extension. Standard platform PostgreSQL is sufficient unless indexing more than 1 million passages. For further pgvector sizing information, see vector store documentation. For information on how indexing is triggered, see Configure the Document Processing Pipeline.

The application also executes database queries to retrieve structured data, which could increase the database load when handling large-scale data.

Cassandra sizing

Cassandra isn't required unless integrating C3 Generative AI with an app with timeseries data.

Copy link to this sectionLeader and task node sizing

Copy link to this sectionGPU nodes

Copy link to this sectionMultiple leader nodes

Copy link to this sectionScale nodes

Copy link to this sectionAutoscale nodes

Copy link to this sectionManually scale nodes

Copy link to this sectionConcurrent query handling

Copy link to this sectionPostgreSQL sizing

Copy link to this sectionCassandra sizing

Copy link to this sectionSee also

Leader and task node sizing

GPU nodes

Multiple leader nodes

Scale nodes

Autoscale nodes

Manually scale nodes

Concurrent query handling

PostgreSQL sizing

Cassandra sizing

See also