Configure Runtime Parameters and Run a DI Pipeline

This section describes how to prepare a DI pipeline for running by reviewing the full pipeline on the canvas, saving the configuration, and setting all necessary runtime parameters, including file selection, CSV and chunking options, data‑handling rules, error‑handling behavior, and archive settings—before initiating the pipeline run.

Confirm the End-to-End Pipeline on the Canvas

Close the preview to return to the Data Integration canvas.

Verify that the pipeline now shows a complete, connected flow:

Source System (for example, EMRSystem using S3)
Source Collection (PatientRecord)
Source Schema (PatientSource)
Transform (PatientSource-Patient)
Target Type (Patient)

Ensure each node displays a status indicator (for example, field counts such as 10 fields, 20 fields) confirming successful configuration.

Save the Pipeline Configuration

Now, the pipeline is persisted with:

The selected target type
All configured field mappings
The validated transform logic

At this point, the pipeline definition is complete and ready for execution.

Configure Runtime Parameters

After defining the Source, Schema, and Transform, the pipeline structure is complete. However, execution behavior is not yet defined.

Runtime parameters control how the pipeline runs, not how it is built. These settings determine:

Which files are processed in the current run
Whether previously processed data should be reprocessed
How parsing behaves at execution time
How large files are handled
How failures are tolerated
What happens to files after processing

This step allows operators to control ingestion behavior without modifying the pipeline design. It provides flexibility for incremental loads, reprocessing scenarios, performance tuning, and operational recovery.

In short, pipeline configuration defines what to process and how data should be transformed. Runtime configuration defines how this specific execution should behave.

Open Runtime Configuration

On the canvas, locate the Source Collection node.
Click Run (▶) or select Execute pipeline from the node menu.
In the Configure Runtime Parameters modal, review and adjust settings across the available tabs.

Select Files to Process (Basic Tab)

Use the Basic tab to choose which files are included in the pipeline run.

Select Input Files

You may choose:

All Files — Process all detected files.
Select Files — Manually select specific files for this run.
Review the list of available files detected at the configured source path.
Use column filters (such as File Path, Size, Last Modified, or Processing Status) to narrow results.
Select one or more files to include in the run.

Sync Source Collection

Click Sync Source Collection to refresh the file list from the underlying storage system.

Use this after adding, modifying, or deleting files to ensure the pipeline processes the latest source content.

Reprocess Options

Reset Pipeline Before Executing

Stops the current run state and clears queued work so files can be reprocessed from the beginning.

Use this when recovering from a failed run or restarting execution cleanly.

Handle Existing Data

You must choose how existing target data should be treated:

Keep — Preserves previously ingested records. New data is appended or merged according to entity rules.
Delete — The Delete option is provided to support development and testing scenarios where resetting the target state is necessary. It is intentionally disabled in production to prevent accidental data loss or interference with inflight ingestion operations.

Behavior Details

When the delete option is selected, Data that is already in progress may complete processing later and will not be removed immediately when using the delete option. To fully clear all existing target data, monitor the active run and ensure that the source queue has no remaining pending entries. Once the queue is empty, all processed data will be cleared as expected.

Configure CSV Parsing Options (CSV Configuration Tab)

If the Source Collection contains CSV files, use this tab to control how data is parsed. These settings override default parsing behavior for this execution only.

Configure CSV Settings

Set the Delimiter (for example, comma or tab).
Choose the Quote and Escape characters.
Optionally specify a Header Override if incoming headers differ from the expected schema.
Enable or disable CSV Header depending on whether the first row contains column names.

These settings apply only to the current execution unless published as part of configuration management.

Override Source URLs (URL Overrides Tab)

Use this tab to override where files are archived after processing.

Configure Archive Location

Enter a custom Archive URL to control where processed files are moved.
Enable External if file lifecycle management is handled outside of Data Fusion.

This is useful when downstream systems manage retention or cleanup.

Configure Chunking Behavior (Chunking Control Tab)

Chunking allows large files to be split into smaller units for parallel processing.

Enable and Tune Chunking

Enable Chunking.
Specify:
- Chunk Size (Records) to control batch size.
- Chunk Size (MB) to limit chunk size by file size.
Optionally enable Clean Pending Chunks to remove incomplete chunks from prior runs.

Chunking improves throughput for large datasets and long-running pipelines.

Configure Error Handling (Error Handling Tab)

Control how the pipeline responds to processing failures.

Set Error Thresholds

Errors threshold: Specify how many errors are allowed before the pipeline aborts.
Use -1 to allow unlimited errors.
Number of retries: Set how many retry attempts occur for failed write operations, such as version conflicts.

These settings help balance resilience and correctness during execution.

Content Processing

Defines content-level preprocessing options that apply before transformation.
Available settings may vary depending on the connector type.

Serialization Options

Controls how data serialization and deserialization are handled during ingestion.
This is particularly relevant when using custom content types or specialized formats.

Metadata

Defines how metadata is captured or applied during ingestion, such as preserving source file attributes or attaching processing timestamps.

File Operations

Specifies how files are handled after processing. Depending on configuration, files may:

Remain in place
Be moved to an archive location
Be deleted after successful processing

File-level processing status updates in real time
Errors are captured in run history
File states transition based on success or failure
A notification confirms that processing has started

Configure Runtime Parameters and Run a DI Pipeline

Confirm the End-to-End Pipeline on the Canvas

Save the Pipeline Configuration

Configure Runtime Parameters

Open Runtime Configuration

Select Files to Process (Basic Tab)

Select Input Files

Sync Source Collection

Reprocess Options

Reset Pipeline Before Executing

Handle Existing Data

Behavior Details

Configure CSV Parsing Options (CSV Configuration Tab)

Configure CSV Settings

Override Source URLs (URL Overrides Tab)

Configure Archive Location

Configure Chunking Behavior (Chunking Control Tab)

Enable and Tune Chunking

Configure Error Handling (Error Handling Tab)

Set Error Thresholds

Content Processing

Serialization Options

Metadata

File Operations

Archiving

Batch Settings

Affected Targets

Execute the Pipeline

Copy link to this sectionConfirm the End-to-End Pipeline on the Canvas

Copy link to this sectionSave the Pipeline Configuration

Copy link to this sectionConfigure Runtime Parameters

Copy link to this sectionOpen Runtime Configuration

Copy link to this sectionSelect Files to Process (Basic Tab)

Copy link to this sectionSelect Input Files

Copy link to this sectionSync Source Collection

Copy link to this sectionReprocess Options

Copy link to this sectionReset Pipeline Before Executing

Copy link to this sectionHandle Existing Data

Copy link to this sectionBehavior Details

Copy link to this sectionConfigure CSV Parsing Options (CSV Configuration Tab)

Copy link to this sectionConfigure CSV Settings

Copy link to this sectionOverride Source URLs (URL Overrides Tab)

Copy link to this sectionConfigure Archive Location

Copy link to this sectionConfigure Chunking Behavior (Chunking Control Tab)

Copy link to this sectionEnable and Tune Chunking

Copy link to this sectionConfigure Error Handling (Error Handling Tab)

Copy link to this sectionSet Error Thresholds

Copy link to this sectionContent Processing

Copy link to this sectionSerialization Options

Copy link to this sectionMetadata

Copy link to this sectionFile Operations

Copy link to this sectionArchiving

Copy link to this sectionBatch Settings

Copy link to this sectionAffected Targets

Copy link to this sectionExecute the Pipeline

Confirm the End-to-End Pipeline on the Canvas

Save the Pipeline Configuration

Configure Runtime Parameters

Open Runtime Configuration

Select Files to Process (Basic Tab)

Select Input Files

Sync Source Collection

Reprocess Options

Reset Pipeline Before Executing

Handle Existing Data

Behavior Details

Configure CSV Parsing Options (CSV Configuration Tab)

Configure CSV Settings

Override Source URLs (URL Overrides Tab)

Configure Archive Location

Configure Chunking Behavior (Chunking Control Tab)

Enable and Tune Chunking

Configure Error Handling (Error Handling Tab)

Set Error Thresholds

Content Processing

Serialization Options

Metadata

File Operations

Archiving

Batch Settings

Affected Targets

Execute the Pipeline