C3 AI Documentation Home

Understanding Source System

A Source System represents an external system from which Data Fusion ingests data. Source Systems are configured at the start of a pipeline and determine the access pattern (SQL, API, File/Object, or Cloud Messaging) and authentication used.

Supported Connectors in Data Fusion

Data Fusion supports a broad range of connectors that enable integration with cloud storage systems, SaaS platforms, streaming services, and application APIs. Each connector supports one or more authentication mechanisms to align with enterprise security models and cloud-native identity patterns.

The following sections summarize the primary connectors and their supported authentication approaches.

Cloud Storage Connectors

These connectors enable ingestion from cloud‑based file and object storage systems.

Azure Blob Storage / ADLS Gen2

Supported Authentication:

  • SAS Token
  • Managed Identity
  • Service Principal

Supports both key‑based and identity‑based authentication methods. Managed Identity is recommended for Azure‑hosted environments, while Service Principal enables Azure AD–based access.

Amazon S3

Supported Authentication:

  • IAM Role
  • EKS Pod Identity (IRSA)

Provides role‑based access via AWS IAM. IRSA enables workload identity for Kubernetes deployments.

Google Cloud Storage (GCS)

Supported Authentication:

  • Workload Identity
  • Service Account Key
  • Access Token

Supports both identity‑based and service account–based authentication for GCP workloads.

SaaS & Collaboration Connectors

These connectors integrate with cloud‑hosted business tools and collaboration platforms.

OneDrive

Supported Authentication:

  • User OAuth
  • Service Principal

Supports both user‑delegated (OAuth) and application‑level access.

Google Drive

Supported Authentication:

  • Service Principal

Supports application‑level authentication for Drive API access.

Streaming & Messaging Connectors

These connectors support real‑time or event‑driven ingestion.

Azure Event Hubs

Supported Authentication:

  • Service Principal
  • SAS Token

Supports Azure AD–based access and shared access signatures for event streaming.

Apache Kafka

Supported Authentication:

  • SASL/SSL
  • Username/password
  • Kerberos (environment‑dependent)

Supports a variety of enterprise messaging authentication mechanisms.

Amazon Kinesis

Supported Authentication:

  • IAM Role (recommended)

  • Access Key / Secret Key

  • Identity‑based authentication (IRSA/EKS Pod Identity), environment‑dependent

Supports real‑time ingestion from Kinesis Data Streams. Authentication is handled through AWS IAM, enabling secure, role‑based access to stream shards and sequence data.

SQL / Database Connectors

These connectors integrate with databases and SQL‑based warehouses.

Snowflake

Connected as a SQL Source System, using JDBC to access Snowflake’s compute layer.

Supported Authentication:

  • Key Pair Authentication

  • Username/Password This should be used only for SFA accounts. To improve the security posture of the customers, Snowflake is rolling out changes to require multi-factor authentication (MFA) for all users using passwords, and disallow passwords for all service users. These service users must switch to a stronger authentication method that doesn’t require interaction with a person.

  • Token-based authentication (PAT)

    Data Fusion supports token-based authentication for Snowflake connections using a Programmatic Access Token (PAT) in place of a password. When configuring the connector, enter the PAT in the password field along with the username. Although the UI displays a password field, Snowflake supports using the PAT directly in this field.

    Using a PAT is recommended in environments where Multi-Factor Authentication (MFA) is enabled, as direct password-based authentication may not be supported. PATs provide a secure, programmatic alternative that avoids interactive authentication flows while maintaining strong security.

Authentication Model Considerations

Authentication depends on both the connector and the deployment environment. Common models include:

Identity‑based (cloud‑native)

  • AWS: IAM Role, IRSA (EKS Pod Identity)
  • Azure: Managed Identity (for Azure‑hosted workloads), Service Principal (App Registration)
  • GCP: Workload Identity, Service Account

Key / Token‑based

  • SAS Tokens (Azure)
  • Access Keys (Amazon S3)
  • API Keys
  • Bearer Tokens

User‑delegated OAuth

  • Used for SaaS integrations such as OneDrive OAuth

Basic (Username–Password)

  • Traditional application or JDBC connections

Custom Headers

  • Used for bespoke APIs requiring proprietary header‑based authentication

Selecting Appropriate Source System

When configuring a data pipeline, you must first select the appropriate Source System. The correct choice depends on how the external system exposes data, for example, through a SQL interface, application APIs, cloud storage, or messaging systems, and the authentication method it supports.

This table maps common external system types to their corresponding Source System category in Data Fusion. Rather than enumerating all connectors, it focuses on the underlying integration pattern (SQL, API, file/object storage, or streaming) to guide correct Source System selection.

Source System and Example Systems

Source CategoryExample SystemC3 Source System to SelectSupported Authentication Methods (Typical)Access Pattern / Interface Type
Databases & Data WarehousesSnowflakeSQL Source SystemUsername/password, key pair auth, OAuth (env‑dependent)SQL endpoint via JDBC/ODBC
Databricks (Delta Lake via SQL)SQL Source SystemPersonal access token, OAuth, Service Principal (env‑dependent)SQL endpoint (Databricks SQL)
BigQuery (Google)SQL Source SystemService account, OAuthSQL warehouse interface
OracleSQL Source SystemUsername/password, enterprise auth (env‑dependent)Standard RDBMS via JDBC
PostgreSQLSQL Source SystemUsername/password, SSL certs (optional)Standard RDBMS via JDBC
SAP HANASQL Source SystemUsername/password, enterprise auth (env‑dependent)SQL interface via JDBC
Amazon RedshiftSQL Source SystemUsername/password, IAM‑based auth (env‑dependent)Cloud data warehouse via JDBC
Apache Hive / ImpalaSQL Source SystemUsername/password, Kerberos (env‑dependent)SQL engine over Hadoop
IBM Db2SQL Source SystemUsername/password, enterprise authSQL interface via JDBC
Source CategoryExample SystemC3 Source System to SelectSupported Authentication Methods (Typical)Access Pattern / Interface Type
Business ApplicationsServiceNowApplication / API Source SystemOAuth, basic auth (env‑dependent)REST/SOAP APIs
WorkdayApplication / API Source SystemOAuth, WS‑Security (env‑dependent)SOAP/REST APIs
SalesforceApplication / API Source SystemOAuth (recommended), token‑based authREST/Bulk APIs
Source CategoryExample SystemC3 Source System to SelectSupported Authentication Methods (Typical)Access Pattern / Interface Type
File Storage (Cloud / Object Storage)Amazon S3File Source SystemAccess keys, IAM roles, STS (env‑dependent)Object storage (files/buckets)
Google Cloud StorageFile Source SystemService account, Workload Identity, Access TokenObject storage (buckets/files)
Azure Blob / ADLS Gen2File Source SystemShared key, SAS key, Service Principal, OAuth (env‑dependent), Managed IdentityObject storage (containers/files)
Microsoft OneLakeFile Source SystemService Principal (Azure AD), OAuth (env‑dependent)Lakehouse storage over ADLS
Source CategoryExample SystemC3 Source System to SelectSupported Authentication Methods (Typical)Access Pattern / Interface Type
Streaming / Messaging SystemsAzure Event HubCloud Message Source SystemSAS key, Azure AD (Service Principal)Event streaming (message broker)
Apache KafkaCloud Message Source SystemSASL/SSL, username/password, Kerberos (env‑dependent)Event streaming (topic‑based messaging)

How to Choose

Use the following guidance when selecting a source system type:

  1. Start with the interface you will use
  • SQL endpoint? → Select SQL Source System
  • HTTP API (REST/SOAP)? → Select Application / API Source System
  • Cloud/object files? → Select File Source System
  • Events/streams? → Select Cloud Message Source System
  1. Then confirm authentication
  • Prefer identity‑based methods (IAM Role/IRSA, Managed Identity, Workload Identity) for cloud‑native deployments.
  • Fall back to Service Principal, Service Account, or keys/tokens when identity‑based authentication is not available.
  1. Validate external permissions

Ensure the configured cloud role, service principal, or service account has least‑privilege access to the required objects (buckets, containers, schemas, topics, APIs).

SQL vs. File Access (Example: Delta Lake)

If you are unsure whether a system should be accessed via SQL or directly as files (for example, Delta Lake), determine whether you are connecting to:

  • A SQL endpoint → Use SQL Source System
  • The underlying storage location (such as S3 or ADLS) → Use File Source System

This distinction ensures you configure your pipeline based on how the data is accessed, not simply how it is stored.

UI Behavior

  • An Authentication Method dropdown allows selection of the desired method.
  • Only fields relevant to the selected method are displayed.
  • Required fields are clearly marked.
  • Switching authentication methods preserves previously entered values.
  • Secret fields (password, private key, passphrase, OAuth client secret, tokens) are masked.
  • Advanced fields (such as Warehouse and Schema) are available under an expandable section.

Test Connection

The Test Connection action validates the provided credentials.

During validation, the system:

  • Attempts to establish a connection.
  • Executes a lightweight validation query (such as SELECT 1) or lists schemas.
  • Displays success or failure status.

If the test fails:

  • The error state is clearly displayed.
  • Detailed error information is available in an expandable section.

Field Validation

The connector performs basic validation before attempting a connection.

  1. Account / Server Endpoint
  • Must not be empty.
  • Must resemble a Snowflake host (for example: .snowflakecomputing.com).
  1. Database
  • Must not be empty.
  1. Private Key (Key Pair method)
  • Must not be empty.
  • Must contain valid PEM markers.

Additional validation occurs server‑side.

Backward Compatibility

Data Fusion maintains backward compatibility when new authentication mechanisms are introduced for a connector. Existing connector configurations continue to work without requiring any user action.

  • Existing Snowflake connectors using Username/Password authentication continue to function without modification.

When editing an existing connector:

  • If a private key is detected, the connector automatically loads in Key Pair mode.
  • Otherwise, the connector loads in Username/Password mode.

No migration steps are required, and older authentication configurations remain fully supported. This ensures that existing pipelines and deployments remain stable even as newer authentication options are added.

Was this page helpful?