Environment Sizing
Leader & task node sizing
C3 Generative AI requires at least one task node for asynchronous processing. The application has a significant memory footprint, especially in Python. Each leader and task node designated for indexing should meet the following specifications:
- Memory: At least 30 GB, 50% allocated to JVM (for nodes with larger memory capacities, a smaller JVM allocation, such as 30%, may be sufficient)
- Disk Space: Over 100 GB, to support Python runtime installation
- CPU: Minimum of 5 cores
Run the below commands to configure the recommended application infrastructure for a standard deployment:
Genai.QuickStart.setupLeader();
Genai.QuickStart.setupTask();Note: The node pools in the script may not be optimal for large-scale data or production. See Multiple Leader Nodes for managing a high number of concurrent users and GPU Nodes for handling large-scale data.
GPU nodes
While not mandatory, configuring a task node with GPU is recommended for improved performance, particularly for datasets exceeding 40,000 passages (~500 MB). Leveraging GPU can accelerate indexing by 10x to 100x, with additional performance benefits when multiple GPUs are configured.
Requirements for indexing using pgvector
| GPU node memory (GB) | Max passages |
|---|---|
| 32 | 2 million |
| 64 | 14 million |
Requirements for additional features
GPU nodes are also highly beneficial for features that perform inference on models loaded into local memory, including LLM guardrails, corroboration, and multi-modal processing. For memory-intensive features such as multi-modal processing, a 64 GB GPU node is recommended to accommodate the higher memory demands for large documents.
Multiple leader nodes
C3 Generative AI supports multi-leader setups to handle higher concurrent usage.
| Number of leader nodes | Max number of supported concurrent users |
|---|---|
| 1 | 10 |
| 5 | 150 |
Scaling Nodes
Use a Jupyter notebook to scale leader and task nodes. To open a Jupyter notebook in C3 AI Studio, hover over the application card and select Jupyter.
Autoscaling Nodes
Autoscaling for leader or task nodes can be enabled using App.NodePool#setAutoScaleSpec. Various strategies can be employed for auto scaling. These are documented in subtypes of App.NodePool.AutoScaleStrategy.
To use a custom implementation to define autoscaling, we can use App.NodePool.AutoScaleStrategy.Lambda.
Below is an example of adding a custom lambda for auto scaling of leader nodes.
def auto_scale_strategy(strategy=None):
node_count = c3.app().nodePool('leader').nodes()
# implement logic to return count of leader nodes
"""
if queue_size >= 2 * running_threads and running_threads > 0:
return node_count + 1
elif queue_size < running_threads // 2 and running_threads > node_count:
return node_count - 1
else:
return node_count
"""
return node_count
scaling_lambda = c3.Lambda.fromPyFunc(auto_scale_strategy, actionRequirement='py-jep')
strategy = (c3.App.NodePool.AutoScaleStrategy.Lambda.builder()
.runIntervalMins(1) # specify how frequently should this lambda run
.nodePoolName('leader')
.scalingLambda(scaling_lambda)
.maxPctChangeForScaleUp(100)
.maxPctChangeForScaleDown(100)
.build())
c3.app().nodePool('leader').setAutoScaleSpec(strategy=strategy).update()Manually scaling nodes
Step 1: Inspect current configuration
Check the current configuration for the node pools:
c3.app().nodePool('task').config()
c3.app().nodePool('leader').config()These commands return current values for minNodeCount, maxNodeCount, and targetNodeCount.
Step 2: Modify node counts
Set the minimum, maximum, and target number of nodes:
c3.app().nodePool('task').setNodeCount(2, 2, 2)
c3.app().nodePool('leader').setNodeCount(2, 2, 2)The format is: setNodeCount(minCount, maxCount, targetCount).
Ensure that minNodeCount, maxNodeCount, and targetNodeCount are all set to the same value.
Step 3: Apply the changes
c3.app().nodePool('task').update()
c3.app().nodePool('leader').update()Step 4: Verify node status
Use the following command to verify the current node status:
task_nodes = c3.app().nodePool('task').nodes()
len(task_nodes)
leader_nodes = c3.app().nodePool('leader').nodes()
len(leader_nodes)Concurrent query handling
The number of concurrent queries that can be handled in parallel is determined by the number of leader nodes. This is defined below, where Q is the number of concurrent queries and L is the number of leader nodes.
PostgreSQL sizing
For most use cases, C3 Generative AI imposes minimal database load, with the exception of document indexing tasks, which utilize the pgvector PostgreSQL extension. Standard platform PostgreSQL is sufficient unless indexing more than 1 million passages. For further pgvector sizing information, refer to the refer to vector store documentation.
The application also executes database queries to retrieve structured data, which could increase the database load when handling large-scale data.
Cassandra sizing
Cassandra isn't required unless integrating C3 Generative AI with an app with timeseries data.