Configure Change Data Capture (CDC) for a SQL Source Collection
The Change Data Capture (CDC) feature in Data Fusion enables incremental data integration by automatically detecting and propagating changes from the source system to the Lakehouse.
Follow these steps to configure CDC on a SQL Source Collection in the Data Integration canvas.
Open Source Collection Properties
- In the Data Integration canvas, locate your SQL Source Collection node.
- Click the more options (⋯) menu on the node.
- Select Edit properties.
Enable Data Ingestion
- In the Edit Properties panel, select Load Data.
- This enables ingestion through a Data Integration pipeline.
- (The Virtual Table option provides read-only access and does not support CDC.)
Configure Change Data Capture (CDC)
Navigate to the Change Data Capture Configuration section.
Under Order By Fields, specify one or more monotonically increasing fields:
- Select a column (typically a timestamp such as updated_at or last_modified)
- Choose the sort order (usually Ascending)
- Optionally, add an additional field (such as an ID) to ensure deterministic ordering
- Click + Add Field to include additional ordering columns if needed.
The selected fields must increase over time to ensure reliable incremental ingestion. If no suitable column is configured, CDC may result in missed updates, duplicates, or inconsistent ordering.
Configure Integration Schedule (Optional)
In the Integration schedule section, choose how the pipeline should run:
Manual
Run the pipeline only when triggered by the userOn Schedule
Configure a recurring schedule to automatically process new data
Save Configuration
Click Save to apply the CDC configuration to the Source Collection.
Run and Monitor the Pipeline
After configuring CDC:
- Trigger the pipeline using the Run (▶) button on the Source Collection node
- Use the options menu (⋯) to monitor execution:
- View run status
- View data integration status
- View run history
Result
Once configured, the pipeline retrieves only rows with ordering column values greater than the previous checkpoint during each run, enabling incremental ingestion.