Monitor Data Loads using Console
The C3 Agentic AI Platform logs detailed status information about the various steps of a data integration pipeline. This information is stored in the DataIntegStatus Type. Several other Types extend the DataIntegStatus Type to provide status information about each step in a data integration pipeline:
- SourceQueue: view the current computing entries
- SourceStatus: deserialization and processing of the raw content
- SourceChunkStatus: deserialization and processing of a subset of raw content
- TransformStatus: transformation of raw content per application requirements
- TargetStatus: insert and update statistics to the target database
- DataIntegStatus: data integration status across source, transform, and target
In the C3 AI data integration pipeline, the various statuses involved in the processing and transformation of data follow a structured parent-child hierarchy. This hierarchy is crucial for tracking the flow and status of data throughout the integration process.
SourceStatus: At the top of this hierarchy is the SourceStatus. This status represents the overall status of the raw data being ingested from a specific source. It captures essential information about the deserialization and initial processing of the data.
SourceChunkStatus: Beneath the SourceStatus, there can be one or more SourceChunkStatus entries. Each of these statuses corresponds to a subset of the raw content being processed. Chunking the data allows for more manageable pieces to be handled individually, improving efficiency and error isolation. If an error occurs during processing, only the affected chunk must be reprocessed, rather than the entire dataset.
TransformStatus: Each SourceChunkStatus can further parent one or more TransformStatus entries. These statuses track the transformation processes applied to the data chunks according to specific application requirements. Transformations may include data cleansing, formatting, or enrichment, ensuring that the data is in the correct format for its intended use.
TargetStatus: Finally, from each TransformStatus, there can be one TargetStatus entry. These statuses provide insights into the results of inserting or updating the transformed data in the target database or system. The TargetStatus details the success or failure of these operations, along with statistics on the number of records inserted or updated, thus enabling efficient monitoring and management of the entire data integration workflow.
This hierarchical structure allows for better organization and visibility of the data integration process, facilitating easier troubleshooting and performance monitoring at each stage
You can use the following commands to monitor your data loads from console:
// Check if any data is currently being processed
c3Grid(SourceQueue.countAll());
// Check which sources are currently being processed
c3Grid(SourceStatus.fetch({filter: "typeIdent == 'SOURCE'"}));
//Check which chunks are being processed
c3Grid(SourceChunkStatus.fetch({order:"descending(meta.timestamp)"}));
//Review the summary of initial, computing, and failed counts
c3Grid(SourceQueue.count())
// View all past data integration statuses in order of last updated
c3Grid(DataIntegStatus.fetch({order: "descending(meta.timestamp)"}));
//View all failed chunks
c3Grid(DataIntegStatus.fetch({filter:"state=='failed'",order:"descending(meta.timestamp)"}))
//View specific chunk errors
c3Grid(DataIntegStatus.forId('chunkId').errors)
// View all failed sources statuses in the past 24 hours
c3Grid(SourceStatus.fetch({filter: "state == 'failed' && meta.timestamp > now() - period(24, 'HOUR')"}))
// View all previous target statuses where a database record was created
c3Grid(TargetStatus.fetch({filter: "dbStats.createdObjCount > 0"}))
//View all the transform statistics for a source
c3Grid(SourceStatus.make(<sourceid>).allTransforms().collect())
//View all the statistics for processing a source
c3Grid(SourceStatus.make(<sourceid>).all())
//View all the target statistics for a source
c3Grid(SourceStatus.make(<sourceid>).allTargets().collect())
//View chunk status for a source
c3Grid(SourceStatus.make(<sourceid>).allChunks().collect())The SourcesState Type defines the valid states for each DataIntegStatus instance. These statuses can be used to help troubleshoot data integration issues.
Troubleshooting and common errors
This troubleshooting section provides guidance on resolving common issues related to SourceFile processing.
- No SourceFile records showing: Ensure that the FileSourceCollections are correctly configured. Try running SourceFile.syncAll() to sync SourceFiles with the uploaded files.
- SourceFile isn't processing: Confirm that a FileSourceCollection is defined and provisioned for the target Canonical. Additionally, check that task nodes are available on the MNE.
- SourceFile.status == "Failed": Check the DataIntegStatus for more specific details regarding the failure.
- Data not showing up on target Type and SourceFile.status == "Initial": The SourceFile hasn't been processed yet. Use SourceFile.forId("<SourceFile.id>").process() to initiate processing of the file. Also, check App.actionDump() to see if the action has been picked up by the nodes.
- SourceFile.status == "Processing" but data is taking a long time to be available on target Types: Use c3Grid(InvalidationQueue.countAll()) to confirm that the SourceQueue is not paused and that tasks are moving from the Pending to Computing columns.