C3 AI Documentation Home

Environment Sizing

Leader and task node sizing

C3 Generative AI requires at least one task node for asynchronous processing. The application has a significant memory footprint, especially in Python. Each leader and task node designated for indexing should meet the following specifications:

  • Memory: At least 30 GB, 50% allocated to JVM (for nodes with larger memory capacities, a smaller JVM allocation, such as 30%, may be sufficient)
  • Disk Space: Over 100 GB, to support Python runtime installation
  • CPU: Minimum of 5 cores

GPU nodes

While not mandatory, a task node with a GPU improves performance for datasets exceeding 40,000 passages (~500 MB). A GPU can accelerate indexing by 10x to 100x, with additional gains when multiple GPUs are available.

Indexing requirements (pgvector)

GPU node memory (GB)Max passages
322 million
6414 million

Additional feature requirements

GPU nodes also improve features that perform inference on models loaded into local memory, including LLM guardrails, corroboration, and multimodal processing. For multimodal processing — including Genai.SourceFile.Chunker.Mew3 and Genai.SourceFile.Chunker.MsftX — a 64 GB GPU node is recommended to handle the higher memory demands of large document parsing.

Multiple leader nodes

C3 Generative AI supports multi-leader setups to handle higher concurrent usage.

Number of leader nodesMax number of supported concurrent users
110
5150

Scale nodes

Use a Jupyter notebook to scale leader and task nodes. To open a Jupyter notebook in C3 AI Studio, hover over the application card and select Jupyter.

Autoscale nodes

Autoscaling for leader or task nodes can be enabled using App.NodePool#setAutoScaleSpec. Various strategies are documented in subtypes of App.NodePool.AutoScaleStrategy.

To define a custom autoscaling implementation, use App.NodePool.AutoScaleStrategy.Lambda.

Below is an example of adding a custom lambda for auto scaling of leader nodes.

Python
def auto_scale_strategy(strategy=None):
    node_count = c3.app().nodePool('leader').nodes()
    # implement logic to return count of leader nodes
    """
    if queue_size >= 2 * running_threads and running_threads > 0:
        return node_count + 1
    elif queue_size < running_threads // 2 and running_threads > node_count:
        return node_count - 1
    else:
        return node_count
    """
    return node_count

scaling_lambda = c3.Lambda.fromPyFunc(auto_scale_strategy, actionRequirement='py-jep')
strategy = (c3.App.NodePool.AutoScaleStrategy.Lambda.builder()
            .runIntervalMins(1) # specify how frequently should this lambda run
            .nodePoolName('leader')
            .scalingLambda(scaling_lambda)
            .maxPctChangeForScaleUp(100)
            .maxPctChangeForScaleDown(100)
            .build())
c3.app().nodePool('leader').setAutoScaleSpec(strategy=strategy).update()

Manually scale nodes

Manually scale leader or task nodes using App.NodePool#setNodeCount. For more information, see Configure and Manage Node Pools.

  1. Check the current configuration for the node pools:

    Python
    c3.app().nodePool('task').config()
    c3.app().nodePool('leader').config()

    These commands return current values for minNodeCount, maxNodeCount, and targetNodeCount.

  2. Set the minimum, maximum, and target number of nodes:

    Python
    c3.app().nodePool('task').setNodeCount(2, 2, 2)
    c3.app().nodePool('leader').setNodeCount(2, 2, 2)

    The format is: setNodeCount(minCount, maxCount, targetCount).

  3. Apply the changes:

    Python
    c3.app().nodePool('task').update()
    c3.app().nodePool('leader').update()
  4. Verify node status:

    Python
    task_nodes = c3.app().nodePool('task').nodes()
    len(task_nodes)
    
    leader_nodes = c3.app().nodePool('leader').nodes()
    len(leader_nodes)

Concurrent query handling

The number of concurrent queries is determined by the number of leader nodes, where Q is the number of concurrent queries and L is the number of leader nodes:

Q=5×LQ = 5 \times L

PostgreSQL sizing

For most use cases, C3 Generative AI imposes minimal database load, with the exception of document indexing tasks, which use the pgvector PostgreSQL extension. Standard platform PostgreSQL is sufficient unless indexing more than 1 million passages. For further pgvector sizing information, see vector store documentation. For information on how indexing is triggered, see Configure the Document Processing Pipeline.

The application also executes database queries to retrieve structured data, which could increase the database load when handling large-scale data.

Cassandra sizing

Cassandra isn't required unless integrating C3 Generative AI with an app with timeseries data.

See also

Was this page helpful?