C3 AI Documentation Home

Declare Pipelines for File Sources

You can declare data pipelines to load file contents into the C3 Agentic AI Platform. Use
instances of the FileSourceSystem and FileSourceCollection Types to specify the directory within a connected remote file system, where the platform can look for files with a consistent schema.

Create a file source system

A FileSourceSystem instance represents a logical grouping of file directories residing in the same file system. The contents of the directories are loaded into the system.

By default, the platform comes with a Canonical FileSourceSystem that points to the DATA_LOAD mount path of the default configured file system. To integrate files from other file systems, you need to create a new FileSourceSystem instance.

For example, to integrate files from an S3 bucket called MyBucket that has been connected to the platform, create a new instance of a FileSourceSystem Type by adding a .json file to the /metadata/FileSourceSystem directory of your package.

JSON
[{
  "name": "MyFileSourceSystem",
  "rootUrlOverride": "s3://MyBucket/"
}]

Create a file source collection

A FileSourceCollection instance specifies a directory within a source system where files of a consistent schema are staged for loading into the C3 Agentic AI Platform. Each of these files must exhibit the same schema, which should be modeled with a Source or Canonical Type. For more information, see Map and Transform Source Data.

In this example, there is a collection of files containing measurements from a number of sensors, each delivered to a subdirectory within the previously mentioned S3 bucket on a daily basis. Define a Source Type SourceSensorMeasurements to model the schema of the file.

Type
type SourceSensorMeasurement mixes Source {
  
  Timestamp: datetime

  @ser(name="Facility and Asset")
  FacilityAsset: string

  Measurement: double
  
}

For more information on defining Source or Canonical Types to model the schema of a data object, see Map and Transform Source Data.

Before these files can be integrated into the C3 Agentic AI Platform, you must first define a FileSourceCollection Type instance to inform the platform where it should expect these files to be delivered.

JSON
{
  "name": "MyFileSourceCollection",
  "source": "SourceSensorMeasurements",
  "sourceSystem": {
    "name": "MyFileSourceSystem"
  }
}

For any FileSourceCollection Type instance that is created, the files are expected to be delivered to the associated inbox URL. To check the inbox URL for a FileSourceCollection Type instance, use the inboxUrl method.

JavaScript
FileSourceCollection.forName(MyFileSourceCollection).inboxUrl();

The inbox URL for a FileSourceCollection instance by default is a concatenation of the root URL of the associated FileSourceSystem instance, and the name of the FileSourceCollection instance. For example, the inbox URL for the file source collection in the previous example would be s3://MyBucket/MyFileSourceCollection/inbox.

Configure a file source collection

If you are trying to integrate data from an existing file system directory that does not follow the platform's expected inbox URL naming convention, you have to configure the file source collection to override the default inbox URL.

In this example, files are located in the s3://MyBucket/sensor-measurements/ sub-directory. To update the default configuration of the file source collection, create a .json file in the /config/SourceCollection.Config/ sub-directory of your package.

JSON
[{
    "name": "MyFileSourceCollection",
    "inboxUrlOverride": "s3://MyBucket/sensor-measurements/"
}]

Running FileSourceCollection.forName(MyFileSourceCollection).inboxUrl() returns the expected URL path.

Alternatively, you can override the configuration directly, rather than through the package /config/SourceCollection.Config/ sub-directory.

JavaScript
FileSourceCollection.forName("MyFileSourceCollection")
                    .config()
                    .setConfigValue(
                      "inboxUrlOverride",
                      "s3://MyBucket/sensor-measurements/"
                    );

Other configuration parameters are available to change the runtime behavior of data integration using the SourceCollection.Config Type. For more information, see Sync and Process Files.

Was this page helpful?