Configuring File Source Collection for Traceability

This topic guides you through configuring a source collection in C3 AI using a JSON configuration file. The goal is to ensure proper traceability of data sources within the system, enabling you to manage and track data effectively. Setting the configuration file with the specified options can help you track the linkage between a source file and the target object in C3 AI, provided the application is set up to retain and utilize metadata effectively.

As a data engineer using C3 AI to integrate and manage various data sources, you need to set up a new source collection to ingest data from a specific source, ensuring that all relevant metadata is included and that the data can be traced back to its origin. This configuration helps maintain data integrity and facilitate auditing and troubleshooting.

Enabling this feature requires additional processing, and the transformed objects need to store more data related to their sources.

Navigate to the Configuration Directory
Locate the configuration directory in your package: <pkg>/config/FileSourceCollection.Config/.
Create or Edit the JSON Configuration File
Create a new JSON file or edit an existing one named after your source collection: <SourceCollection_Name>.json.
Define the Configuration
Use the following template to define your source collection configuration:

JSON

{
  "name" : "<SourceCollection_Name>",
  "includeMeta" : true,
  // other config options
  "doNotArchiveSources" : true,
  "inboxUrlOverride" : "<inbox url>",
  "processMode" : "MANUAL"
}

Relevant configuration option for tracking source-target linkage

"includeMeta": true
a. If set to true, the metadata from the source file is included during the ingestion process.
b. This metadata typically contains details such as the source file's name, location, and other attributes that can be linked to the target object.

This following task guides you through the process of synchronizing file metadata in C3 AI using DataIntegSpec to ensure proper traceability of data sources. By following these steps, you will be able to include metadata and process files effectively, maintaining data integrity and facilitating auditing.

As a data engineer using C3 AI, you need to synchronize metadata for a batch of files or all files within a specific context. This ensures that all relevant information is included and processed, aiding in the traceability of data sources.

Synchronize metadata for a batch of files

Use the SourceFile.syncFileMetadataBatch function from DataIntegSpec to synchronize metadata for a specific array of files.

JavaScript

SourceFile.syncFileMetadataBatch(<file_array>, {
    includeMeta: true,
    // other options
    process: true
});

where:

<file_array>: Replace this with the array of files you want to synchronize.
includeMeta: Set to true to include metadata in the synchronization process.
process: Set to true to process the files after synchronization.
// other options: Add any additional configuration options as needed.

Synchronize metadata for all files

Use the SourceFile.syncAll function from DataIntegSpec to synchronize metadata for all files within the specified context.

JavaScript

SourceFile.syncAll({
    includeMeta: true,
    // other options
    process: true
});

where:

includeMeta: Set to true to include metadata in the synchronization process.
process: Set to true to process the files after synchronization.
// other options: Add any additional configuration options as needed.

By following these steps, you can successfully synchronize file metadata in C3 AI using DataIntegSpec, ensuring proper traceability and management of your data sources. This setup helps you maintain data integrity and facilitate effective auditing and troubleshooting.

Counting transformed objects from a specific source file

After configuring a source collection in C3 AI using a JSON configuration file to ensure proper traceability of data sources, it is crucial to verify the integrity and completeness of the data transformation process. One effective way to achieve this is by counting the number of transformed records originating from a specific source file. This can be accomplished using the fetchCount method, as demonstrated in the following code example.

This helps in verifying that all data transformations are correctly attributed to their sources.

Example

JavaScript

TransformedData.fetchCount({filter: "meta.sourceFile == 'data_source_2024.csv'"})

This code is used after the data integration (DI) process to count the number of transformed objects from a specific source file.

where:

<Target_Type>: The type of the target object (e.g., TransformedData).
fetchCount: Method to count objects.
filter: Parameter to filter objects based on a condition.
meta.sourceFile: Metadata attribute storing the source file name.
<source_file_name>: The name of the source file.

Best practices

Validate the source file name before using it in the filter.
Handle potential errors, such as missing metadata attributes.

Copy link to this sectionRelevant configuration option for tracking source-target linkage

Copy link to this sectionSynchronize metadata for a batch of files

Copy link to this sectionSynchronize metadata for all files

Copy link to this sectionCounting transformed objects from a specific source file

Copy link to this sectionBest practices

Copy link to this sectionSee also

Relevant configuration option for tracking source-target linkage

Synchronize metadata for a batch of files

Synchronize metadata for all files

Counting transformed objects from a specific source file

Best practices

See also