C3 AI Documentation Home

Cassandra Database

The C3 Agentic AI Platform offers various database solutions:

  • Cassandra: A BASE-compliant database optimized for high availability and performance. Supports large data loads with low latency reads and writes, eventual consistency, and flexible schema design.
  • PostgreSQL: An ACID-compliant relational database that suits structured data and SQL operations.
  • H2: An ACID-compliant database primarily for testing and single node environments. Extends PostgreSQL functionality with simplified setup and reduced resource requirements.

Additionally, the platform supports other KV store alternatives and file system storage. To learn more about alternative data storage solutions and cost and performance considerations, see Specify a KV Store.

The following sections provide additional information about Cassandra database features.

Database characteristics

Cassandra is a default KV store solution for the C3 Agentic AI Platform. The following characteristics summarize Cassandra as a supported database:

  • Heavy volume: Supports large data loads (greater than 100 million but less than 100 billion data points)
  • Time series: Supports timeseries data
  • High availability: Maintains continuous operation has automatic failover mechanisms
  • Low latency reads and writes: Serves fast data access with minimal response times across operations
  • Eventual data consistency: Prioritizes availability over immediate consistency, which supports data synchronization
  • Flexible schema: Supports dynamic schema changes without downtime

For more information about Cassandra and its limitations, see Specify a KV Store.

Normalization process

The following diagram describes the data normalization process for Cassandra in the C3 Agentic AI Platform:

flowchart LR A[Read raw time series] --> B[Initial clean] B --> C[De-duplicate] C --> D[Sort by field/timestamp] D --> E[Detect interval] E --> F[Interpolate] F --> G[Align interval] G --> H[Store normalized time series] B --> B1["Rejects points > 50<br/>years wide, 50 years<br/>ago, or 50 years in<br/>the future"] C --> C1["Handles points that are exact<br/>duplicates, controlled by<br/>'duplicateHandling' field on<br/>TimeseriesHeader"] D --> D1["Handles overlapping<br/>measurements: defaults to<br/>'average'. Also handles<br/>'estimated'."] E --> E1["Detect from first100 raw<br/>data points. Can override<br/>with 'interval' field on<br/>TimeseriesHeader."] F --> F1["Treats missing<br/>data points:<br/>zero, linear, or<br/>custom"] G --> G1["Controlled by 'treatment' on<br/>TimeseriesHeader or<br/>TimeseriesDatapoint:<br/>c3ShowType(Treatment)"] style A fill:#2E86AB style B fill:#2E86AB style C fill:#2E86AB style D fill:#2E86AB style E fill:#2E86AB style F fill:#2E86AB style G fill:#2E86AB style H fill:#2E86AB

Cassandra optimization in the C3 Agentic AI Platform

The C3 Agentic AI Platform provides the following optimization capabilities for Cassandra.

C3 AI configurations

The C3 Agentic AI platform offers the following configurations to optimize Cassandra data processing:

ConfigurationDescriptionDefault value
TimedDataFields#bucketIntervalTime interval for bucketing normalized time series data to optimize readsN/A
NormalizationConfig#NormalizationPartitionStrategyBucket strategy that defines when to move data from hot storage to cold storageSix months in hot storage, then move to cold storage

Strategies for high-volume, time-series data

If you work with high-volume, time-series data, you can use the following capabilities to optimize Cassandra performance.

Composite key

Suits high frequency data in a single partition, such as greater than 25 million data points for a single partition key. A composite key provides the following value:

  • Relieves strain on Cassandra processing
  • Reduces KV row sizes, which leads to the following:
    • Smaller C3 AI compactions
    • Less deletion of temporary columns
    • Less hot spots
    • Higher throughput
    • Better memory efficiency
  • Reduces number of tombstones created in Cassandra

To learn more about how to use a composite key, see "Composite keys and external Types" in the Virtualization topic.

Hot/cold storage

Hot/cold storage helps balance performance and cost with Cassandra:

  • Balances performance and cost by separating frequently-accessed (hot) data from infrequently-accessed data (cold).
  • Reduces storage demand by moving cold data to cold storage
  • Leads to improved query performance, reduced memory pressure, and better cluster health

To learn more about hot-cold storage with time series data, see "Hot/cold normalized storage" in the Time Series topic.

See also

Was this page helpful?