Cassandra Database

The C3 Agentic AI Platform offers various database solutions:

Cassandra: A BASE-compliant database optimized for high availability and performance. Supports large data loads with low latency reads and writes, eventual consistency, and flexible schema design.
PostgreSQL: An ACID-compliant relational database that suits structured data and SQL operations.
H2: An ACID-compliant database primarily for testing and single node environments. Extends PostgreSQL functionality with simplified setup and reduced resource requirements.

Additionally, the platform supports other KV store alternatives and file system storage. To learn more about alternative data storage solutions and cost and performance considerations, see Specify a KV Store.

The following sections provide additional information about Cassandra database features.

Database characteristics

Cassandra is a default KV store solution for the C3 Agentic AI Platform. The following characteristics summarize Cassandra as a supported database:

Heavy volume: Supports large data loads (greater than 100 million but less than 100 billion data points)
Time series: Supports timeseries data
High availability: Maintains continuous operation has automatic failover mechanisms
Low latency reads and writes: Serves fast data access with minimal response times across operations
Eventual data consistency: Prioritizes availability over immediate consistency, which supports data synchronization
Flexible schema: Supports dynamic schema changes without downtime

For more information about Cassandra and its limitations, see Specify a KV Store.

Normalization process

The following diagram describes the data normalization process for Cassandra in the C3 Agentic AI Platform:

flowchart LR A[Read raw time series] --> B[Initial clean] B --> C[De-duplicate] C --> D[Sort by field/timestamp] D --> E[Detect interval] E --> F[Interpolate] F --> G[Align interval] G --> H[Store normalized time series] B --> B1["Rejects points > 50 years wide, 50 years ago, or 50 years in the future"] C --> C1["Handles points that are exact duplicates, controlled by 'duplicateHandling' field on TimeseriesHeader"] D --> D1["Handles overlapping measurements: defaults to 'average'. Also handles 'estimated'."] E --> E1["Detect from first100 raw data points. Can override with 'interval' field on TimeseriesHeader."] F --> F1["Treats missing data points: zero, linear, or custom"] G --> G1["Controlled by 'treatment' on TimeseriesHeader or TimeseriesDatapoint: c3ShowType(Treatment)"] style A fill:#2E86AB style B fill:#2E86AB style C fill:#2E86AB style D fill:#2E86AB style E fill:#2E86AB style F fill:#2E86AB style G fill:#2E86AB style H fill:#2E86AB

Cassandra optimization in the C3 Agentic AI Platform

The C3 Agentic AI Platform provides the following optimization capabilities for Cassandra.

C3 AI configurations

The C3 Agentic AI platform offers the following configurations to optimize Cassandra data processing:

Configuration	Description	Default value
TimedDataFields#bucketInterval	Time interval for bucketing normalized time series data to optimize reads	N/A
NormalizationConfig#NormalizationPartitionStrategy	Bucket strategy that defines when to move data from hot storage to cold storage	Six months in hot storage, then move to cold storage

Strategies for high-volume, time-series data

If you work with high-volume, time-series data, you can use the following capabilities to optimize Cassandra performance.

Composite key

Suits high frequency data in a single partition, such as greater than 25 million data points for a single partition key. A composite key provides the following value:

Relieves strain on Cassandra processing
Reduces KV row sizes, which leads to the following:
- Smaller C3 AI compactions
- Less deletion of temporary columns
- Less hot spots
- Higher throughput
- Better memory efficiency
Reduces number of tombstones created in Cassandra

To learn more about how to use a composite key, see "Composite keys and external Types" in the Virtualization topic.

Hot/cold storage

Hot/cold storage helps balance performance and cost with Cassandra:

Balances performance and cost by separating frequently-accessed (hot) data from infrequently-accessed data (cold).
Reduces storage demand by moving cold data to cold storage
Leads to improved query performance, reduced memory pressure, and better cluster health

To learn more about hot-cold storage with time series data, see "Hot/cold normalized storage" in the Time Series topic.

Copy link to this sectionDatabase characteristics

Copy link to this sectionNormalization process

Copy link to this sectionCassandra optimization in the C3 Agentic AI Platform

Copy link to this sectionC3 AI configurations

Copy link to this sectionStrategies for high-volume, time-series data

Copy link to this sectionComposite key

Copy link to this sectionHot/cold storage

Copy link to this sectionSee also