Configure Kafka as a Streaming Data Source
Streaming platforms such as Kafka provide real-time event data as it is generated. Kafka is commonly used for messaging, telemetry ingestion, activity tracking, and stream processing.
In Data Fusion, Kafka enables continuous ingestion of event-driven data for use cases such as:
- Real-time ingestion — Process telemetry or event streams from applications and devices
- AI-driven workflows — Support near real-time inference and decision-making
- Operational monitoring — Power dashboards and alerting systems with up-to-date data
Before you begin
Streaming data is typically append-only and time-series in nature. Consider how this data will be stored and accessed:
- For high-throughput or time-series workloads, map data to entities backed by a key-value store
- Ensure the target canonical maps to an entity configured with the Key-Value tag in the Object Model
Generate Kafka API credentials
Before configuring the connector, obtain access credentials for your Kafka cluster.
- Request an API key and API secret from your Kafka administrator
- For managed services such as Confluent Cloud, follow provider-specific steps to generate credentials
- Ensure the credentials have read access to the Kafka topic you plan to ingest
Add a Kafka connector
- Open your application in C3 AI Studio
- Navigate to Data Fusion
- In the Data Sources panel, select Add data source
- Select Kafka from the streaming connectors list
- Select Next
Configure connector details
Provide the following:
- Name — Identifier for the connector
- Description — Optional description
Configure authentication
Provide connection details for your Kafka cluster:
- Endpoint — Kafka bootstrap server (for example, host:port)
- API Key — Kafka API key
- API Secret — Kafka API secret
Select Next to proceed.
Select a Kafka data stream
After configuring the connector, define the source collection by selecting the Kafka topic.
Configure source collection
Provide the following:
- Name — Identifier for the source collection
- Description — Optional description
Select data stream
- Broker name — Enter the Kafka topic name
If the system cannot automatically list available streams, manually enter the topic name.
The Broker name field corresponds to the Kafka topic from which data will be consumed.
Preview data (optional)
If access permissions allow, Data Fusion displays a preview of messages from the selected topic.
If no preview appears:
- Verify topic name accuracy (case-sensitive)
- Confirm API key has read access to the topic
Select Save and Test, then proceed to schema configuration.
Continue the data integration workflow
After saving the Kafka source, continue configuring the data pipeline:
- Define the schema — Review and adjust inferred fields from the streaming data
- Map to a canonical — Connect the source schema to a target data model
- Configure transformations — Apply projections, transformations, or filters as needed
These steps complete the data integration workflow and enable ingestion into your application.
Troubleshoot Kafka connections
Unable to list available streams
If the system displays “Unable to list available streams”:
- Enter the Kafka topic manually in the Broker name field
- Verify that the Kafka credentials are valid and correctly entered
- Confirm that the Kafka credentials have DESCRIBE permission on the topic
No partitions available
If the Partition (preview only) dropdown is empty:
- Confirm that the Kafka credentials have DESCRIBE permission
- Verify that the topic exists in the Kafka cluster
No preview data displayed
Data is retrieved only after selecting a partition.
If no data appears after selecting a partition:
- Confirm that the Kafka credentials have READ permission
- Verify that the selected partition contains data
- Ensure the topic name and partition are entered correctly
If the request succeeds but returns no data, the UI displays No data available.
If the request fails, an error is displayed, which may indicate:
- Invalid credentials
- Incorrect topic or partition
Connection fails during setup
If the connector fails to validate:
- Verify the endpoint (bootstrap server) is correct
- Confirm that the API key and API secret are valid and active
- Ensure network access to the Kafka cluster is available
Key considerations
- Kafka topics must exist in the external Kafka system; Data Fusion does not create topics
- Topic discovery may not be available depending on permissions
- Data preview requires selecting a partition and depends on both access permissions and data availability
See also
- Understand the Source System
- Configure the Source System and Source Collection
- Configure the Source Schema
- Understand Change Data Capture (CDC) in Data Fusion
- Configure Change Data Capture (CDC) for SQL Source Collection
- Add and Configure a Transform for a DI Pipeline
- Map Source Fields to Target Fields
- Configure Runtime Parameters and Trigger a DI Pipeline
- Confirm Data Fusion Pipeline Run Completion
- Connect C3 AI Application to Apache Kafka
- Manipulate ERD Views with Data Model