Configure Source Schema
This section explains how to configure a Source Schema in Data Fusion, including creating a new schema, reusing an existing one, and navigating between the Schema and Code views.
When configuring a File Source Collection, you must define how incoming file data is interpreted by Data Fusion. The Source Schema determines the structure, field names, and data types that downstream transforms and targets rely on.
Configure the Source Schema
In the canvas, Click Source (or Click to configure schema) on the canvas.
In the Configure Source Schema dialog:
- Choose Create New Source Schema, or
- Select Use an existing Source Schema to reuse one.
Option 1: Create a New Source Schema
Use this option when the incoming file structure is new or unique to this pipeline. You can use the inferred version from the server or manually create one.
When Create New Source Schema is selected:
You define the schema directly in the UI by specifying:
- Column names
- Column aliases
- Data types
- Optional descriptions
Fields can be added incrementally using the Add Field action.
The schema is created as part of the pipeline configuration and becomes available for downstream transforms.
This option is recommended when:
- You are onboarding a new dataset.
- The file format does not match any existing Source Schema.
- You want full control over field definitions before ingestion.
You can switch to the Code tab at any time to:
- View the generated schema definition.
- Edit the schema directly in code.
- Validate the schema before saving.
Option 2: Use an Existing Source Schema
Use this option when the file data conforms to a schema that already exists in the application.
When Use an existing Source Schema is selected:
- You choose from a list of previously defined Source Schemas.
- The selected schema is applied directly to the Source Collection.
- No field-level editing is required during setup.
This option is recommended when:
- The file structure matches an existing dataset.
- You want consistency across multiple pipelines.
- The schema has already been validated and reused elsewhere.
You can still review the schema in the Code tab, but structural changes should be made carefully to avoid breaking dependent pipelines.
Schema vs Code Tabs
The Source Schema configuration supports two complementary views:
Schema tab
A guided, form-based interface for defining or reviewing fields and data types.
Code tab
A code editor for advanced users who want to:
- Inspect the full schema definition
- Make precise edits
- Validate before saving
Both views operate on the same underlying schema definition and stay in sync.
Key Takeaway
Whether you create a new Source Schema or reuse an existing one, the goal is the same: ensure that file data is consistently structured before it flows into transforms and targets. Choosing the right option upfront helps reduce rework and ensures smoother downstream pipeline configuration.