Understanding Source System

A Source System represents an external system from which Data Fusion ingests data. Source Systems are configured at the start of a pipeline and determine the access pattern (SQL, API, File/Object, or Cloud Messaging) and authentication used.

Supported Connectors in Data Fusion

Data Fusion supports a broad range of connectors that enable integration with cloud storage systems, SaaS platforms, streaming services, and application APIs. Each connector supports one or more authentication mechanisms to align with enterprise security models and cloud-native identity patterns.

The following sections summarize the primary connectors and their supported authentication approaches.

Cloud Storage Connectors

These connectors enable ingestion from cloud‑based file and object storage systems.

Azure Blob Storage / ADLS Gen2

Supported Authentication:

SAS Token
Managed Identity
Service Principal

Supports both key‑based and identity‑based authentication methods. Managed Identity is recommended for Azure‑hosted environments, while Service Principal enables Azure AD–based access.

Amazon S3

Supported Authentication:

IAM Role
EKS Pod Identity (IRSA)

Provides role‑based access via AWS IAM. IRSA enables workload identity for Kubernetes deployments.

Google Cloud Storage (GCS)

Supported Authentication:

Workload Identity
Service Account Key
Access Token

Supports both identity‑based and service account–based authentication for GCP workloads.

SaaS & Collaboration Connectors

These connectors integrate with cloud‑hosted business tools and collaboration platforms.

OneDrive

Supported Authentication:

User OAuth
Service Principal

Supports both user‑delegated (OAuth) and application‑level access.

Google Drive

Supported Authentication:

Service Principal

Supports application‑level authentication for Drive API access.

Streaming & Messaging Connectors

These connectors support real‑time or event‑driven ingestion.

Azure Event Hubs

Supported Authentication:

Service Principal
SAS Token

Supports Azure AD–based access and shared access signatures for event streaming.

Apache Kafka

Supported Authentication:

SASL/SSL
Username/password
Kerberos (environment‑dependent)

Supports a variety of enterprise messaging authentication mechanisms.

Amazon Kinesis

Supported Authentication:

IAM Role (recommended)
Access Key / Secret Key
Identity‑based authentication (IRSA/EKS Pod Identity), environment‑dependent

Supports real‑time ingestion from Kinesis Data Streams. Authentication is handled through AWS IAM, enabling secure, role‑based access to stream shards and sequence data.

SQL / Database Connectors

These connectors integrate with databases and SQL‑based warehouses.

Snowflake

Connected as a SQL Source System, using JDBC to access Snowflake’s compute layer.

Supported Authentication:

Key Pair Authentication
Username/Password This should be used only for SFA accounts. To improve the security posture of the customers, Snowflake is rolling out changes to require multi-factor authentication (MFA) for all users using passwords, and disallow passwords for all service users. These service users must switch to a stronger authentication method that doesn’t require interaction with a person.
Token-based authentication (PAT)
Data Fusion supports token-based authentication for Snowflake connections using a Programmatic Access Token (PAT) in place of a password. When configuring the connector, enter the PAT in the password field along with the username. Although the UI displays a password field, Snowflake supports using the PAT directly in this field.
Using a PAT is recommended in environments where Multi-Factor Authentication (MFA) is enabled, as direct password-based authentication may not be supported. PATs provide a secure, programmatic alternative that avoids interactive authentication flows while maintaining strong security.

Authentication Model Considerations

Authentication depends on both the connector and the deployment environment. Common models include:

Identity‑based (cloud‑native)

AWS: IAM Role, IRSA (EKS Pod Identity)
Azure: Managed Identity (for Azure‑hosted workloads), Service Principal (App Registration)
GCP: Workload Identity, Service Account

Key / Token‑based

SAS Tokens (Azure)
Access Keys (Amazon S3)
API Keys
Bearer Tokens

User‑delegated OAuth

Used for SaaS integrations such as OneDrive OAuth

Basic (Username–Password)

Traditional application or JDBC connections

Custom Headers

Used for bespoke APIs requiring proprietary header‑based authentication

Prefer identity‑based authentication when available—it is more secure, easier to rotate, and integrates cleanly with cloud‑native IAM. Ensure your organization supports one of the required authentication mechanisms (e.g., OAuth, Service Principal, IAM Role, Service Account, Access Keys) and that the external system grants the necessary permissions.

Selecting Appropriate Source System

When configuring a data pipeline, you must first select the appropriate Source System. The correct choice depends on how the external system exposes data, for example, through a SQL interface, application APIs, cloud storage, or messaging systems, and the authentication method it supports.

This table maps common external system types to their corresponding Source System category in Data Fusion. Rather than enumerating all connectors, it focuses on the underlying integration pattern (SQL, API, file/object storage, or streaming) to guide correct Source System selection.

Source System and Example Systems

Source Category	Example System	C3 Source System to Select	Supported Authentication Methods (Typical)	Access Pattern / Interface Type
Databases & Data Warehouses	Snowflake	SQL Source System	Username/password, key pair auth, OAuth (env‑dependent)	SQL endpoint via JDBC/ODBC
	Databricks (Delta Lake via SQL)	SQL Source System	Personal access token, OAuth, Service Principal (env‑dependent)	SQL endpoint (Databricks SQL)
	BigQuery (Google)	SQL Source System	Service account, OAuth	SQL warehouse interface
	Oracle	SQL Source System	Username/password, enterprise auth (env‑dependent)	Standard RDBMS via JDBC
	PostgreSQL	SQL Source System	Username/password, SSL certs (optional)	Standard RDBMS via JDBC
	SAP HANA	SQL Source System	Username/password, enterprise auth (env‑dependent)	SQL interface via JDBC
	Amazon Redshift	SQL Source System	Username/password, IAM‑based auth (env‑dependent)	Cloud data warehouse via JDBC
	Apache Hive / Impala	SQL Source System	Username/password, Kerberos (env‑dependent)	SQL engine over Hadoop
	IBM Db2	SQL Source System	Username/password, enterprise auth	SQL interface via JDBC

Source Category	Example System	C3 Source System to Select	Supported Authentication Methods (Typical)	Access Pattern / Interface Type
Business Applications	ServiceNow	Application / API Source System	OAuth, basic auth (env‑dependent)	REST/SOAP APIs
	Workday	Application / API Source System	OAuth, WS‑Security (env‑dependent)	SOAP/REST APIs
	Salesforce	Application / API Source System	OAuth (recommended), token‑based auth	REST/Bulk APIs

Source Category	Example System	C3 Source System to Select	Supported Authentication Methods (Typical)	Access Pattern / Interface Type
File Storage (Cloud / Object Storage)	Amazon S3	File Source System	Access keys, IAM roles, STS (env‑dependent)	Object storage (files/buckets)
	Google Cloud Storage	File Source System	Service account, Workload Identity, Access Token	Object storage (buckets/files)
	Azure Blob / ADLS Gen2	File Source System	Shared key, SAS key, Service Principal, OAuth (env‑dependent), Managed Identity	Object storage (containers/files)
	Microsoft OneLake	File Source System	Service Principal (Azure AD), OAuth (env‑dependent)	Lakehouse storage over ADLS

Source Category	Example System	C3 Source System to Select	Supported Authentication Methods (Typical)	Access Pattern / Interface Type
Streaming / Messaging Systems	Azure Event Hub	Cloud Message Source System	SAS key, Azure AD (Service Principal)	Event streaming (message broker)
	Apache Kafka	Cloud Message Source System	SASL/SSL, username/password, Kerberos (env‑dependent)	Event streaming (topic‑based messaging)

How to Choose

Use the following guidance when selecting a source system type:

Start with the interface you will use

SQL endpoint? → Select SQL Source System
HTTP API (REST/SOAP)? → Select Application / API Source System
Cloud/object files? → Select File Source System
Events/streams? → Select Cloud Message Source System

Then confirm authentication

Prefer identity‑based methods (IAM Role/IRSA, Managed Identity, Workload Identity) for cloud‑native deployments.
Fall back to Service Principal, Service Account, or keys/tokens when identity‑based authentication is not available.

Validate external permissions

Ensure the configured cloud role, service principal, or service account has least‑privilege access to the required objects (buckets, containers, schemas, topics, APIs).

SQL vs. File Access (Example: Delta Lake)

If you are unsure whether a system should be accessed via SQL or directly as files (for example, Delta Lake), determine whether you are connecting to:

A SQL endpoint → Use SQL Source System
The underlying storage location (such as S3 or ADLS) → Use File Source System

This distinction ensures you configure your pipeline based on how the data is accessed, not simply how it is stored.

UI Behavior

An Authentication Method dropdown allows selection of the desired method.
Only fields relevant to the selected method are displayed.
Required fields are clearly marked.
Switching authentication methods preserves previously entered values.
Secret fields (password, private key, passphrase, OAuth client secret, tokens) are masked.
Advanced fields (such as Warehouse and Schema) are available under an expandable section.

Test Connection

The Test Connection action validates the provided credentials.

During validation, the system:

Attempts to establish a connection.
Executes a lightweight validation query (such as SELECT 1) or lists schemas.
Displays success or failure status.

If the test fails:

The error state is clearly displayed.
Detailed error information is available in an expandable section.

Field Validation

The connector performs basic validation before attempting a connection.

Account / Server Endpoint

Must not be empty.
Must resemble a Snowflake host (for example: .snowflakecomputing.com).

Database

Must not be empty.

Private Key (Key Pair method)

Must not be empty.
Must contain valid PEM markers.

Additional validation occurs server‑side.

Schema and Warehouse fields are optional.

Backward Compatibility

Data Fusion maintains backward compatibility when new authentication mechanisms are introduced for a connector. Existing connector configurations continue to work without requiring any user action.

Existing Snowflake connectors using Username/Password authentication continue to function without modification.

When editing an existing connector:

If a private key is detected, the connector automatically loads in Key Pair mode.
Otherwise, the connector loads in Username/Password mode.

No migration steps are required, and older authentication configurations remain fully supported. This ensures that existing pipelines and deployments remain stable even as newer authentication options are added.

Copy link to this sectionSupported Connectors in Data Fusion

Copy link to this sectionCloud Storage Connectors

Copy link to this sectionAzure Blob Storage / ADLS Gen2

Copy link to this sectionAmazon S3

Copy link to this sectionGoogle Cloud Storage (GCS)

Copy link to this sectionSaaS & Collaboration Connectors

Copy link to this sectionOneDrive

Copy link to this sectionGoogle Drive

Copy link to this sectionStreaming & Messaging Connectors

Copy link to this sectionAzure Event Hubs

Copy link to this sectionApache Kafka

Copy link to this sectionAmazon Kinesis

Copy link to this sectionSQL / Database Connectors

Copy link to this sectionSnowflake

Copy link to this sectionAuthentication Model Considerations

Copy link to this sectionIdentity‑based (cloud‑native)

Copy link to this sectionKey / Token‑based

Copy link to this sectionUser‑delegated OAuth

Copy link to this sectionBasic (Username–Password)

Copy link to this sectionCustom Headers

Copy link to this sectionSelecting Appropriate Source System

Copy link to this sectionSource System and Example Systems

Copy link to this sectionHow to Choose

Copy link to this sectionSQL vs. File Access (Example: Delta Lake)

Copy link to this sectionUI Behavior

Copy link to this sectionTest Connection

Copy link to this sectionField Validation

Copy link to this sectionBackward Compatibility

Supported Connectors in Data Fusion

Cloud Storage Connectors

Azure Blob Storage / ADLS Gen2

Amazon S3

Google Cloud Storage (GCS)

SaaS & Collaboration Connectors

OneDrive

Google Drive

Streaming & Messaging Connectors

Azure Event Hubs

Apache Kafka

Amazon Kinesis

SQL / Database Connectors

Snowflake

Authentication Model Considerations

Identity‑based (cloud‑native)

Key / Token‑based

User‑delegated OAuth

Basic (Username–Password)

Custom Headers

Selecting Appropriate Source System

Source System and Example Systems

How to Choose

SQL vs. File Access (Example: Delta Lake)

UI Behavior

Test Connection

Field Validation

Backward Compatibility