Cassandra Database
The C3 Agentic AI Platform offers various database solutions:
- Cassandra: A BASE-compliant database optimized for high availability and performance. Supports large data loads with low latency reads and writes, eventual consistency, and flexible schema design.
- PostgreSQL: An ACID-compliant relational database that suits structured data and SQL operations.
- H2: An ACID-compliant database primarily for testing and single node environments. Extends PostgreSQL functionality with simplified setup and reduced resource requirements.
Additionally, the platform supports other KV store alternatives and file system storage. To learn more about alternative data storage solutions and cost and performance considerations, see Specify a KV Store.
The following sections provide additional information about Cassandra database features.
Database characteristics
Cassandra is a default KV store solution for the C3 Agentic AI Platform. The following characteristics summarize Cassandra as a supported database:
- Heavy volume: Supports large data loads (greater than 100 million but less than 100 billion data points)
- Time series: Supports timeseries data
- High availability: Maintains continuous operation has automatic failover mechanisms
- Low latency reads and writes: Serves fast data access with minimal response times across operations
- Eventual data consistency: Prioritizes availability over immediate consistency, which supports data synchronization
- Flexible schema: Supports dynamic schema changes without downtime
For more information about Cassandra and its limitations, see Specify a KV Store.
Normalization process
The following diagram describes the data normalization process for Cassandra in the C3 Agentic AI Platform:
Cassandra optimization in the C3 Agentic AI Platform
The C3 Agentic AI Platform provides the following optimization capabilities for Cassandra.
C3 AI configurations
The C3 Agentic AI platform offers the following configurations to optimize Cassandra data processing:
| Configuration | Description | Default value |
|---|---|---|
| TimedDataFields#bucketInterval | Time interval for bucketing normalized time series data to optimize reads | N/A |
| NormalizationConfig#NormalizationPartitionStrategy | Bucket strategy that defines when to move data from hot storage to cold storage | Six months in hot storage, then move to cold storage |
Strategies for high-volume, time-series data
If you work with high-volume, time-series data, you can use the following capabilities to optimize Cassandra performance.
Composite key
Suits high frequency data in a single partition, such as greater than 25 million data points for a single partition key. A composite key provides the following value:
- Relieves strain on Cassandra processing
- Reduces KV row sizes, which leads to the following:
- Smaller C3 AI compactions
- Less deletion of temporary columns
- Less hot spots
- Higher throughput
- Better memory efficiency
- Reduces number of tombstones created in Cassandra
To learn more about how to use a composite key, see "Composite keys and external Types" in the Virtualization topic.
Hot/cold storage
Hot/cold storage helps balance performance and cost with Cassandra:
- Balances performance and cost by separating frequently-accessed (hot) data from infrequently-accessed data (cold).
- Reduces storage demand by moving cold data to cold storage
- Leads to improved query performance, reduced memory pressure, and better cluster health
To learn more about hot-cold storage with time series data, see "Hot/cold normalized storage" in the Time Series topic.