C3 AI Documentation Home

Data Fusion Overview

Data Fusion provides a canvas-based interface in C3 AI Studio to design, configure, and manage data pipelines. It enables you to connect external data sources, define how data is transformed, and integrate it into your application—all from a unified workspace.

Using Data Fusion, you can:

  • Connect to external systems using built-in connectors
  • Configure ingestion pipelines visually
  • Map and transform data into application models
  • Monitor and manage pipeline execution

This guide introduces Data Fusion concepts and provides example workflows for connecting and integrating data sources. While not all connectors are covered, representative examples are included across key categories.

Supported Data Source Categories

Data Fusion supports a wide range of data source categories, including:

  • Databases & Data Warehouses (for example, Snowflake)
  • File Systems and Cloud Storage (for example, Amazon S3)
  • Streaming and Message Brokers (for example, Amazon Kinesis)

Additional supported categories include:

  • Big Data & NoSQL
  • ERP & CRM
  • Accounting
  • Marketing & Analytics
  • Collaboration
  • File & API Integration
  • E-Commerce
  • Relational Databases

How Data Fusion works

Data Fusion enables you to configure data pipelines using a visual graph on the Data Integration canvas, while leveraging underlying C3 AI Types.

Data Fusion allows you to use the following C3 Types without writing any code to update data pipelines:

  • SourceSystem: Any external system, streaming service, data warehouse, or database that you use to import data into C3 Agentic AI Platform
  • SourceCollection: A logical grouping of data objects from a specific data source
  • Source: A model for data objects that are imported to your application on the platform
  • Canonical: An inbound interface that specifies the schema of data loaded into the platform
  • Transform: A function or method that transforms data to match a Source, Canonical, or Entity schema

These components form a pipeline where data flows from external systems into your application model.

graph TD subgraph Implementation-specific sys[SourceSystem] -->|SourceCollection belongs to a SourceSystem| coll[SourceCollection] coll -->|Source models the schema of contents in a SourceCollection| Source(Source) end subgraph pa [Provided by an Application] Canonical -->|Transforms from Canonical| Entity end Source -->|Transforms from Source| Canonical %% Style for outer boxes style Implementation-specific fill:#e0e0e0,stroke-dasharray: 5 5 style pa fill:#e0e0e0,stroke-dasharray: 5 5

You can build and update data pipelines using high-code tools and Data Fusion interchangeably. For more information about the data pipeline, see Data Pipeline Architecture.

Accessing Data Fusion in C3 AI Studio

To access Data Fusion in C3 AI Studio:

  1. Open your application in C3 AI Studio
  2. Navigate to Data → Data Fusion from the left navigation panel

The Data Fusion workspace opens with a canvas-based interface where you define and manage pipelines.

Data Fusion Workspace (Updated UI)

The Data Fusion workspace is organized around a central canvas, with supporting controls.

Canvas Area

  • Displays pipeline nodes and their relationships
  • Used to build and modify data flows

Top Navigation Tabs

  • Data Integration – Primary workspace for pipeline design
  • Object Model – View and manage entities and relationships
  • Data Validation – Define and monitor validation rules

Primary Actions (Top Right)

  • Add Data Source – Create new source connections
  • Settings (gear icon) – Access pipeline-level actions, such as publishing configurations
  • Configure GitHub – Connect to version control and publish changes

Search and Filter Controls

  • Quickly locate nodes within the canvas

This layout reflects a shift from tab-specific workflows to a canvas-first pipeline experience.

Data Sources (via Canvas)

Data sources are configured directly from the Data Fusion workspace using Add Data Source.

You can:

  • Create new connections to external systems
  • Configure authentication and connection parameters
  • Validate connections before use
  • Preview source data

Once added, data sources appear as nodes on the canvas and can be used in pipeline configurations.

Data Integration (Canvas-Based Pipelines)

The Data Integration canvas is the primary workspace for building pipelines.

Here, you can:

  • Define ingestion flows from source systems
  • Map data into application schemas
  • Apply transformations (filtering, aggregation, reshaping)
  • Connect pipeline components visually

This ensures that data is correctly structured and ready for downstream use.

Object Model

The Object Model tab in C3 AI Studio provides a graph-based, interactive view of your application’s data structure, displayed as an Entity Relationship Diagram (ERD). It serves as a central workspace for visualizing, creating, and managing entities (data objects) and their relationships within your C3 AI application. This visual representation helps you understand how data is structured, related, and used across the platform, forming the foundation for data integration, validation, and downstream analytics.

The following list describes actions you can take in the Object Model Tab.

  • Visualize Entity Relationships:

    • Explore how different entities are connected, understand dependencies, and navigate across packages using the color-coded legend under legend.
    • You can also switch the node coloring to visualize entities by different categorizations, helping you interpret the model from different functional perspectives. Entity categories include:
      • Relational vs Key-Value
      • Integration Status vs App Population
      • Data Presence vs Absence
  • Create and Manage Entities:

    • Create Entity from File Upload: Upload structured data (CSV) to automatically generate entities and their schema.
    • Define in Code (Advanced): Programmatically creates entities for greater flexibility.
    • Edit Schemas: Refine entity schemas to ensure the data structure meets your application requirements.
  • Identify Entity Attributes: Use intuitive icons to identify key characteristics such as Primary Key, Foreign Key, Indexed Field, and Stored Calculation for each entity.

  • Customize and Manage Diagram Views:

    • Save and Manage Views: Save your current diagram layout as a View for easy reuse. Rename, delete, or manage saved views directly from the interface.
    • Publish Views to Git: Export and version-control your saved views as metadata to share across environments.
  • Adjust Diagram Settings:

    • Toggle between Integrated View (includes entities from dependency packages) and Application View (shows only entities defined in your app). Use zoom, layout, and alignment controls to adjust the visualization.
  • Export the Diagram:

    • Download a snapshot of the current model layout for documentation or offline reference.

The Object Model tab gives you an end-to-end understanding of your data architecture—empowering you to design, maintain, and share consistent, well-structured data models across your application ecosystem.

Data Validation

The Data Validation tab allows you to create, manage, and monitor validation rules that check the quality of ingested data. These rules help ensure that the data coming from external sources is consistent, accurate, and ready for downstream use in the C3 AI Object Model and applications.

Key elements in this tab include:

  • Deployed Rules: Displays all active validation rules that have been deployed. Each rule evaluates ingested data against specific quality checks (for example, ensuring no missing values in critical columns, verifying ranges for numeric fields, or enforcing referential integrity).
  • Drafts: Allows you to create and test validation rules before deploying them. Drafts help you experiment with rules and refine them to fit your data requirements.
  • Type Health: Provides insights into the overall health of the data types in your application model, based on the results of deployed validation rules.
  • Filters: Use filters such as Status and Last Run (start and end dates) to locate specific rule results. This makes it easy to track validation checks over time.

See also

Was this page helpful?