Understanding Source System
A Source System represents an external system from which Data Fusion ingests data. Source Systems are configured at the start of a pipeline and determine the access pattern (SQL, API, File/Object, or Cloud Messaging) and authentication used.
Supported Connectors in Data Fusion
Data Fusion supports a broad range of connectors that enable integration with cloud storage systems, SaaS platforms, streaming services, and application APIs. Each connector supports one or more authentication mechanisms to align with enterprise security models and cloud-native identity patterns.
The following sections summarize the primary connectors and their supported authentication approaches.
Cloud Storage Connectors
These connectors enable ingestion from cloud‑based file and object storage systems.
Azure Blob Storage / ADLS Gen2
Supported Authentication:
- SAS Token
- Managed Identity
- Service Principal
Supports both key‑based and identity‑based authentication methods. Managed Identity is recommended for Azure‑hosted environments, while Service Principal enables Azure AD–based access.
Amazon S3
Supported Authentication:
- IAM Role
- EKS Pod Identity (IRSA)
Provides role‑based access via AWS IAM. IRSA enables workload identity for Kubernetes deployments.
Google Cloud Storage (GCS)
Supported Authentication:
- Workload Identity
- Service Account Key
- Access Token
Supports both identity‑based and service account–based authentication for GCP workloads.
SaaS & Collaboration Connectors
These connectors integrate with cloud‑hosted business tools and collaboration platforms.
OneDrive
Supported Authentication:
- User OAuth
- Service Principal
Supports both user‑delegated (OAuth) and application‑level access.
Google Drive
Supported Authentication:
- Service Principal
Supports application‑level authentication for Drive API access.
Streaming & Messaging Connectors
These connectors support real‑time or event‑driven ingestion.
Azure Event Hubs
Supported Authentication:
- Service Principal
- SAS Token
Supports Azure AD–based access and shared access signatures for event streaming.
Apache Kafka
Supported Authentication:
- SASL/SSL
- Username/password
- Kerberos (environment‑dependent)
Supports a variety of enterprise messaging authentication mechanisms.
Amazon Kinesis
Supported Authentication:
IAM Role (recommended)
Access Key / Secret Key
Identity‑based authentication (IRSA/EKS Pod Identity), environment‑dependent
Supports real‑time ingestion from Kinesis Data Streams. Authentication is handled through AWS IAM, enabling secure, role‑based access to stream shards and sequence data.
SQL / Database Connectors
These connectors integrate with databases and SQL‑based warehouses.
Snowflake
Connected as a SQL Source System, using JDBC to access Snowflake’s compute layer.
Supported Authentication:
Key Pair Authentication
Username/Password This should be used only for SFA accounts. To improve the security posture of the customers, Snowflake is rolling out changes to require multi-factor authentication (MFA) for all users using passwords, and disallow passwords for all service users. These service users must switch to a stronger authentication method that doesn’t require interaction with a person.
Token-based authentication (PAT)
Data Fusion supports token-based authentication for Snowflake connections using a Programmatic Access Token (PAT) in place of a password. When configuring the connector, enter the PAT in the password field along with the username. Although the UI displays a password field, Snowflake supports using the PAT directly in this field.
Using a PAT is recommended in environments where Multi-Factor Authentication (MFA) is enabled, as direct password-based authentication may not be supported. PATs provide a secure, programmatic alternative that avoids interactive authentication flows while maintaining strong security.
Authentication Model Considerations
Authentication depends on both the connector and the deployment environment. Common models include:
Identity‑based (cloud‑native)
- AWS: IAM Role, IRSA (EKS Pod Identity)
- Azure: Managed Identity (for Azure‑hosted workloads), Service Principal (App Registration)
- GCP: Workload Identity, Service Account
Key / Token‑based
- SAS Tokens (Azure)
- Access Keys (Amazon S3)
- API Keys
- Bearer Tokens
User‑delegated OAuth
- Used for SaaS integrations such as OneDrive OAuth
Basic (Username–Password)
- Traditional application or JDBC connections
Custom Headers
- Used for bespoke APIs requiring proprietary header‑based authentication
Prefer identity‑based authentication when available—it is more secure, easier to rotate, and integrates cleanly with cloud‑native IAM. Ensure your organization supports one of the required authentication mechanisms (e.g., OAuth, Service Principal, IAM Role, Service Account, Access Keys) and that the external system grants the necessary permissions.
Selecting Appropriate Source System
When configuring a data pipeline, you must first select the appropriate Source System. The correct choice depends on how the external system exposes data, for example, through a SQL interface, application APIs, cloud storage, or messaging systems, and the authentication method it supports.
This table maps common external system types to their corresponding Source System category in Data Fusion. Rather than enumerating all connectors, it focuses on the underlying integration pattern (SQL, API, file/object storage, or streaming) to guide correct Source System selection.
Source System and Example Systems
| Source Category | Example System | C3 Source System to Select | Supported Authentication Methods (Typical) | Access Pattern / Interface Type |
|---|---|---|---|---|
| Databases & Data Warehouses | Snowflake | SQL Source System | Username/password, key pair auth, OAuth (env‑dependent) | SQL endpoint via JDBC/ODBC |
| Databricks (Delta Lake via SQL) | SQL Source System | Personal access token, OAuth, Service Principal (env‑dependent) | SQL endpoint (Databricks SQL) | |
| BigQuery (Google) | SQL Source System | Service account, OAuth | SQL warehouse interface | |
| Oracle | SQL Source System | Username/password, enterprise auth (env‑dependent) | Standard RDBMS via JDBC | |
| PostgreSQL | SQL Source System | Username/password, SSL certs (optional) | Standard RDBMS via JDBC | |
| SAP HANA | SQL Source System | Username/password, enterprise auth (env‑dependent) | SQL interface via JDBC | |
| Amazon Redshift | SQL Source System | Username/password, IAM‑based auth (env‑dependent) | Cloud data warehouse via JDBC | |
| Apache Hive / Impala | SQL Source System | Username/password, Kerberos (env‑dependent) | SQL engine over Hadoop | |
| IBM Db2 | SQL Source System | Username/password, enterprise auth | SQL interface via JDBC |
| Source Category | Example System | C3 Source System to Select | Supported Authentication Methods (Typical) | Access Pattern / Interface Type |
|---|---|---|---|---|
| Business Applications | ServiceNow | Application / API Source System | OAuth, basic auth (env‑dependent) | REST/SOAP APIs |
| Workday | Application / API Source System | OAuth, WS‑Security (env‑dependent) | SOAP/REST APIs | |
| Salesforce | Application / API Source System | OAuth (recommended), token‑based auth | REST/Bulk APIs |
| Source Category | Example System | C3 Source System to Select | Supported Authentication Methods (Typical) | Access Pattern / Interface Type |
|---|---|---|---|---|
| File Storage (Cloud / Object Storage) | Amazon S3 | File Source System | Access keys, IAM roles, STS (env‑dependent) | Object storage (files/buckets) |
| Google Cloud Storage | File Source System | Service account, Workload Identity, Access Token | Object storage (buckets/files) | |
| Azure Blob / ADLS Gen2 | File Source System | Shared key, SAS key, Service Principal, OAuth (env‑dependent), Managed Identity | Object storage (containers/files) | |
| Microsoft OneLake | File Source System | Service Principal (Azure AD), OAuth (env‑dependent) | Lakehouse storage over ADLS |
| Source Category | Example System | C3 Source System to Select | Supported Authentication Methods (Typical) | Access Pattern / Interface Type |
|---|---|---|---|---|
| Streaming / Messaging Systems | Azure Event Hub | Cloud Message Source System | SAS key, Azure AD (Service Principal) | Event streaming (message broker) |
| Apache Kafka | Cloud Message Source System | SASL/SSL, username/password, Kerberos (env‑dependent) | Event streaming (topic‑based messaging) |
How to Choose
Use the following guidance when selecting a source system type:
- Start with the interface you will use
- SQL endpoint? → Select SQL Source System
- HTTP API (REST/SOAP)? → Select Application / API Source System
- Cloud/object files? → Select File Source System
- Events/streams? → Select Cloud Message Source System
- Then confirm authentication
- Prefer identity‑based methods (IAM Role/IRSA, Managed Identity, Workload Identity) for cloud‑native deployments.
- Fall back to Service Principal, Service Account, or keys/tokens when identity‑based authentication is not available.
- Validate external permissions
Ensure the configured cloud role, service principal, or service account has least‑privilege access to the required objects (buckets, containers, schemas, topics, APIs).
SQL vs. File Access (Example: Delta Lake)
If you are unsure whether a system should be accessed via SQL or directly as files (for example, Delta Lake), determine whether you are connecting to:
- A SQL endpoint → Use SQL Source System
- The underlying storage location (such as S3 or ADLS) → Use File Source System
This distinction ensures you configure your pipeline based on how the data is accessed, not simply how it is stored.
UI Behavior
- An Authentication Method dropdown allows selection of the desired method.
- Only fields relevant to the selected method are displayed.
- Required fields are clearly marked.
- Switching authentication methods preserves previously entered values.
- Secret fields (password, private key, passphrase, OAuth client secret, tokens) are masked.
- Advanced fields (such as Warehouse and Schema) are available under an expandable section.
Test Connection
The Test Connection action validates the provided credentials.
During validation, the system:
- Attempts to establish a connection.
- Executes a lightweight validation query (such as
SELECT 1) or lists schemas. - Displays success or failure status.
If the test fails:
- The error state is clearly displayed.
- Detailed error information is available in an expandable section.
Field Validation
The connector performs basic validation before attempting a connection.
- Account / Server Endpoint
- Must not be empty.
- Must resemble a Snowflake host (for example:
.snowflakecomputing.com).
- Database
- Must not be empty.
- Private Key (Key Pair method)
- Must not be empty.
- Must contain valid PEM markers.
Additional validation occurs server‑side.
Schema and Warehouse fields are optional.
Backward Compatibility
Data Fusion maintains backward compatibility when new authentication mechanisms are introduced for a connector. Existing connector configurations continue to work without requiring any user action.
- Existing Snowflake connectors using Username/Password authentication continue to function without modification.
When editing an existing connector:
- If a private key is detected, the connector automatically loads in Key Pair mode.
- Otherwise, the connector loads in Username/Password mode.
No migration steps are required, and older authentication configurations remain fully supported. This ensures that existing pipelines and deployments remain stable even as newer authentication options are added.