Understanding Change Data Capture (CDC) in Data Fusion

Data Fusion is in Beta. Please contact your C3 AI representative to enable this feature.

Change Data Capture (CDC) in C3 AI Studio enables incremental ingestion of changes from SQL source tables into a staging file system. In this release, the Data Fusion UI supports CDC only for SQL-based data sources and only for tables containing a monotonically increasing timestamp column, which is used to detect newly inserted or updated records. CDC uses a strictly time-based trigger mechanism, meaning changes are captured only when the scheduled job runs.

To enable CDC, you must configure:

A change detection column (a timestamp that increases with each update)
An integration schedule that determines when incremental sync occurs

Only INSERT and UPDATE operations are captured in this release; DELETE operations are not propagated.
CDC writes all output exclusively to a staging file system, and full sync and on-demand sync modes are not supported.

Supported Capabilities in 9.0

Source Types

SQL connectors only (for example, Snowflake, Postgres, SQL Server, MySQL)

Change Operations

Inserts (appends)
Updates
Deletes – Not supported

Targets

Staging file system only
No ingestion into DataStores or Iceberg tables (datalake sync) in this release

Sync Modes

Incremental sync only
Full sync and on-demand sync are not supported

UI Support

CDC is configured through the Data Fusion interface
CDC can be enabled only if the selected table includes a valid timestamp column

Prerequisite: Source Table Must Contain a Timestamp Column

Before you can enable Change Data Capture (CDC) for a source table, the table must include a column with timestamp or datetime values.
This column is used as the change tracking column, which the system relies on to detect and sync new or updated rows from the source into the platform.

CDC compares the timestamp values in this column against the last ingestion time to determine which records have changed.
If no such column exists, the system cannot perform incremental updates, and CDC cannot be enabled for that table.

Example of a Valid Schema for CDC

Column Name	Data Type	Description
`ID`	INTEGER	Primary key for the row
`CUSTOMER_NAME`	STRING	Name of the customer
`LAST_UPDATED_TS`	TIMESTAMP	Indicates when the record was created or last modified — used for CDC

Note:
If the selected table does not include a timestamp column, CDC cannot be configured.
In such cases, only full reload integration is supported.

Delete Operations Not Propagated Through CDC

Current CDC behavior in C3 AI Studio supports tracking inserts and updates from the source system.
Delete operations are not propagated to the target entity.

This means that if a record is deleted in the source table, the corresponding record in the target entity remains unchanged. The platform does not perform a hard delete or soft delete based on source-side deletions.

This is expected behavior for the current CDC implementation, which focuses on incremental ingestion rather than full state synchronization.

If your use case requires downstream deletion handling, consider one of the following approaches:

Use full-table reload integration instead of CDC for that dataset.
Use a downstream cleansing or reconciliation job to manage deletions periodically.

Behavior Note: Transformations Are Applied Only When CDC Is Enabled

In Data Fusion, transformations are executed only when a Source Collection is part of a CDC-enabled pipeline.
Enabling CDC activates the transformation stage, allowing you to:

Rename fields
Add derived columns
Reshape or normalize the ingested data

These transformations are applied before the data is written to the staging file system.

If CDC is not enabled for a Source Collection:

The platform performs a direct extract from the source.
No transformations or schema modifications are applied.
The output reflects the raw structure and raw column names from the source table.
Any schema adjustments must be made in the source system, not in Data Fusion.
You cannot select a target entity; instead, Data Fusion automatically persists the ingested data into a default, system-generated external entity that mirrors the raw Source Collection schema.

In short:
CDC unlocks Data Fusion’s transformation capabilities.
Without CDC, the pipeline writes raw, unmodified data to a default target entity in the staging layer, with no option to customize the schema or target type.

How Incremental Ingestion Works in CDC

Once CDC is enabled on a SQL table and mapped to a target Entity (stored in the C3 database), the platform begins capturing incremental changes based on the configured timestamp column.

Example

Your source SQL table contains 10,000 rows.
You configure CDC using a monotonically increasing timestamp.
At the next scheduled sync, 15 rows have new or updated timestamps.

Result with CDC

Only those 15 changed rows are ingested and written into the target Entity in the C3 database.

This greatly reduces unnecessary processing and avoids re-reading the entire dataset during each run.

Copy link to this sectionSupported Capabilities in 9.0

Copy link to this sectionSource Types

Copy link to this sectionChange Operations

Copy link to this sectionTargets

Copy link to this sectionSync Modes

Copy link to this sectionUI Support

Copy link to this sectionPrerequisite: Source Table Must Contain a Timestamp Column

Copy link to this sectionExample of a Valid Schema for CDC

Copy link to this sectionDelete Operations Not Propagated Through CDC

Copy link to this sectionBehavior Note: Transformations Are Applied Only When CDC Is Enabled

Copy link to this sectionHow Incremental Ingestion Works in CDC

Copy link to this sectionExample

Copy link to this sectionResult with CDC

Copy link to this sectionSee also