Troubleshoot multimodal parsing

When to Use This Guide

Use this guide when you must:

Optimize chunking performance for large document volumes.
Troubleshoot slow or stuck chunking processes.
Configure GPU acceleration for faster processing.
Monitor and validate chunking operations.

Prerequisites

Before starting, ensure:

Your C3 Generative AI application is running.
You have access to the Application C3 AI Console and Jupyter Notebook.
You have run the Quickstart application and Mew3 is enabled see Multimodal Parsing.
You have documents uploaded and ready for processing.

Quick Health Check

Run this command in your Application Console to confirm Mew3 is active:

JavaScript

// Verify Mew3 is configured as the chunker
var chunkerConfig = Genai.SourceFile.Chunker.UniversalChunker.Config.forConfigKey('default');
var fileExtToChunkerSpecMap = C3.Map.fromJson(chunkerConfig.fileExtToChunkerSpecMap);
console.log('PDF chunker:', fileExtToChunkerSpecMap.get('.pdf').get('chunker'));
// Should return: Genai.SourceFile.Chunker.Mew3

If this doesn't return Genai.SourceFile.Chunker.Mew3, run Genai.QuickStart.enableMew3();.

Step 1: System Assessment

Understand your current system state before making configuration changes.

1.1 Check Available Nodes and Resources

JavaScript

// List all nodes to find task node IDs
C3.app()
  .nodes()
  .forEach((node) => {
    console.log(`Node ID: ${node.id}`);
  });

1.2 Check Hardware Configuration

JavaScript

// Check task node pool hardware configuration
// C3.app().nodePools() will list all configured nodepools; in certain cases there may be another task pool such as "gputask"

console.log('Task Node Pool Config:');
console.log('CPU:', C3.app().nodePool('task').config().hardwareProfile.cpu);
console.log('Memory:', C3.app().nodePool('task').config().hardwareProfile.memoryMb, 'MB');
console.log('GPU:', C3.app().nodePool('task').config().hardwareProfile.gpu);

Interpretation:

GPU: 0 = CPU-only processing (slower but works)
GPU: 1+ = GPU acceleration available (faster)

1.3 Check Current Files and Processing Status

Change the filter to "Indexed" to see the status of chunked documents currently being indexed.

JavaScript

// Check what files need processing
var statusCounts = {};
Genai.SourceFile.fetch({ limit: -1 }).objs.forEach((file) => {
  var status = file.status ? file.status.value : 'Unknown';
  statusCounts[status] = (statusCounts[status] || 0) + 1;
});
console.log('File Status Distribution:', statusCounts);

// Check files currently being processed
var processingFiles = Genai.SourceFile.fetch({
  filter: 'status.value != "Chunked" && status.value != "Failed"',
  limit: 5, // Show first 5 for brevity
}).objs;

1.4 Assessment Decision Tree

Based on your system assessment, choose your path:

If you have GPU nodes (gpu > 0): → Go to Step 2A: GPU Configuration

If you have CPU-only nodes (gpu = 0): → Go to Step 2B: CPU Optimization

If you have no task nodes or they're not running: → Go to Step 2C: Node Setup

If files are stuck processing: → Go to Step 4: Troubleshooting

Step 2: Configuration Based on Your System

Configure your system for optimal performance based on the assessment results. Read the environment sizing guide for specific memory allocation recommendations and autoscaling strategies.

Step 2A: GPU Configuration (Recommended for High Volume)

If your assessment showed gpu > 0, configure for GPU acceleration:

JavaScript

// Configure UniversalChunker to target GPU task nodes
Genai.SourceFile.Chunker.UniversalChunker.Config.forConfigKey('default').withDeploySpec({
  nodePools: ['task'], // Target task nodes with GPUs
});

// Verify the configuration - check if deploySpec is properly set
Genai.SourceFile.Chunker.UniversalChunker.Config.forConfigKey('default').getConfig().deploySpec;

This command sets the hardware to a GPU configuration. Make sure the hardware is available on the cluster before running this command. For example, you would need an NVIDIA-T4 GPU available before running this command.

JavaScript

// Optional: Ensure task nodes have optimal GPU configuration
var app = C3.app();
app
  .nodePool('task')
  .setNodeCount(1, 0, 1) // min=1, max=1, desired=1
  .setHardwareProfile(4, 32000, 1, 'nvidia', 'Standard_NC64as_T4_v3') // 4 CPU, 32GB RAM, 1 GPU
  .setAutoScaleSpec(true)
  .setJvmSpec(0.8);
app.nodePool('task').update();
console.log('GPU task node configuration updated');

Step 2B: CPU Optimization (Good for Low-Medium Volume)

If your assessment showed gpu = 0, optimize for CPU processing. When you autoscale nodes, runtimes must be installed before chunking can begin, which may add up to 15 minutes for your configuration to update.

JavaScript

// Configure for CPU-only processing with more nodes
var app = C3.app();
app
  .nodePool('task')
  .setNodeCount(2, 0, 4) // Scale to more nodes for parallel processing
  .setHardwareProfile(8, 31000, 0, null, null) // 8 CPU, 31GB RAM, no GPU
  .setAutoScaleSpec(true);
app.nodePool('task').update();
console.log('CPU task node configuration updated');

JavaScript

// Configure UniversalChunker for CPU nodes
Genai.SourceFile.Chunker.UniversalChunker.Config.forConfigKey('default').withDeploySpec({ nodePools: ['task'] });
console.log('✅ UniversalChunker configured');

Step 2C: Node Setup (If No Task Nodes Exist)

If your assessment showed no task nodes, create them.

JavaScript

// Create/configure task node pool and create a task node
// Run only if you have no task nodes. Otherwise, this will reset your node count.
C3.App.NodePool.Config.make('task').withTargetNodeCount(1).withMinNodeCount(1).withMaxNodeCount(1).setConfig();

C3.app().nodePool('task').update();

// Verify the setup
console.log('Task Node Pool Config:');
console.log('CPU:', C3.app().nodePool('task').config().hardwareProfile.cpu);
console.log('Memory:', C3.app().nodePool('task').config().hardwareProfile.memoryMb, 'MB');
console.log('GPU:', C3.app().nodePool('task').config().hardwareProfile.gpu);

Step 2D: Validation

After configuration, validate your setup:

JavaScript

// Check task nodes are running
var taskNodes = C3.app()
  .nodes()
  .filter((node) => {
    // Check both roles and node ID for task nodes
    var rolesStr = Array.isArray(node.roles) ? node.roles.join(',') : String(node.roles || '');
    var nodeId = String(node.id || '');
    return rolesStr.includes('TASK') || nodeId.includes('apptask');
  });

taskNodes.forEach((node) => {
  console.log(`Task Node: ${node.id} | State: ${node.state}`);
});

if (taskNodes.length === 0) {
  console.log('⚠️ No task nodes found. Check Step 2 configuration.');
}

// Verify UniversalChunker configuration
var config = Genai.SourceFile.Chunker.UniversalChunker.Config.forConfigKey('default');
console.log('DeploySpec:', config.deploySpec);

Success Criteria:

At least one task node in "RUNNING" state
UniversalChunker deployment specification targets correct node pools
Hardware profile matches your performance needs

Step 3: Start Processing and Monitor Progress

Now that your system is configured, start processing files and monitor the progress.

Step 3.1: Monitor Processing

The following code checks files that are currently not chunked or indexed.

JavaScript

// Check files that need processing
var unprocessedFiles = Genai.SourceFile.fetch({
  filter: 'status.value != "Chunked" && status.value != "Failed" && status.value !="Indexed"',
  limit: 10,
}).objs;

console.log(`Found ${unprocessedFiles.length} files to process`);

Step 3.2: Monitor Queue Activity

JavaScript

// Monitor MapReduce activity (this is where chunking happens)
c3Grid(InvalidationQueue.countAll());

// Check for active processing
var activeQueues = InvalidationQueue.countAll().filter(
  (q) => q.pending > 0 || q.computingActions > 0 || q.computingEntries > 0
);

if (activeQueues.length > 0) {
  console.log('✅ Processing is active!');
  activeQueues.forEach((q) => {
    console.log(`  ${q.queue}: ${q.computingActions} computing, ${q.pending} pending`);
  });
} else {
  console.log('No active queue processing');
}

Step 3.3: Track File Status Changes

You can check file status changes through the Data page of the application which auto refreshes.

Expected Behavior:

MapReduce queue shows activity when processing starts
File status changes from "Queued" → "Processing" → "Chunked"
GPU nodes process ~25% faster than CPU-only nodes

Step 4: Troubleshooting Common Issues

Use this section when processing isn't working as expected.

Issue: No Processing Activity

Symptoms: No MapReduce queue activity, files stuck in "Queued" status

JavaScript

// Diagnostic commands
// Find failed files:

Genai.SourceFile.fetch({
  filter: 'status.value == "Failed"',
  limit: 5, // Show first 5 for brevity
});

//Find processing files:

Genai.SourceFile.fetch({
  filter: 'status.value == "Processing"',
  limit: 5, // Show first 5 for brevity
});

// Check if UniversalChunker is properly configured
var config = Genai.SourceFile.Chunker.UniversalChunker.Config.forConfigKey('default');
console.log('UniversalChunker deploySpec:', config.deploySpec);

// Check the Map Reduce node pool

Genai.SourceFile.Chunker.UniversalChunker.Config.getConfig().mapReduceOptions.nodePool;

If the mapReduceOptions.nodePool returns gputask and you must target the task nodes, you may need to change the pool to the one you want to use.

Issue: Slow Processing Performance

Symptoms: Processing is happening but very slowly

JavaScript

// Performance optimization checks
console.log('=== Performance Diagnostics ===');

// Check current hardware allocation
var taskNodeConfig = C3.app().nodePool('task').config();
console.log('Current Task Node Config:');

// Safely access nodeCount (might be number, object, or undefined)
var currentNodeCount =
  typeof taskNodeConfig.nodeCount === 'object'
    ? taskNodeConfig.nodeCount.desired || taskNodeConfig.nodeCount.current || 1
    : taskNodeConfig.nodeCount || 1;

// Safely access hardware profile properties
var hwProfile = taskNodeConfig.hardwareProfile || {};
console.log('  CPU per node:', hwProfile.cpu || 'N/A');
console.log('  Memory per node:', hwProfile.memory || hwProfile.memoryMb || 'N/A', 'MB');
console.log('  GPU per node:', hwProfile.gpu || 0);

// Scaling recommendations
if ((hwProfile.gpu || 0) === 0) {
  console.log('💡 Recommendation: Scale CPU nodes or add GPU for faster processing');
  // Example scaling for CPU nodes:
  console.log('To scale CPU nodes: C3.app().nodePool("task").setNodeCount(2, 0, 4).update()');
}

// Check node count safely (nodeCount might be a number or object with desired property)
var currentNodeCount =
  typeof taskNodeConfig.nodeCount === 'object'
    ? taskNodeConfig.nodeCount.desired || taskNodeConfig.nodeCount.current || 1
    : taskNodeConfig.nodeCount || 1;

if (currentNodeCount < 2 && (hwProfile.gpu || 0) === 0) {
  console.log('💡 Consider increasing node count for parallel processing');
  console.log(`  Current nodes: ${currentNodeCount}, recommended: 2+`);
}

The following code snippet distributes chunking across as many GPU nodes as possible. This is run automatically as part of the Quick Start application.

JavaScript

Genai.SourceFile.Chunker.UniversalChunker.Config.setConfigValue('mapReduceOptions', {
  batchSize: 1,
  maxConcurrencyPerNode: 1,
  nodePool: 'gputask', //or whatever node pool has GPU
  order: 'descending(originalFile.contentLength)', // if you want to optimize for total time to complete chunking
});

Issue: Files Stuck in "Processing" or "Chunking" Status

Symptoms: Files show "Processing" for extended periods

JavaScript

// Check for stuck chunking
var stuckFiles = Genai.SourceFile.fetch({
  filter: 'status.value == "Chunking"',
  include: 'id, fileName, status, meta.updated',
}).objs;

console.log(`Found ${stuckFiles.length} files stuck in chunking`);
stuckFiles.forEach((file) => {
  var timeDiff = new Date() - new Date(file.meta.updated);
  var minutesStuck = Math.floor(timeDiff / (1000 * 60));
  console.log(`Stuck for ${minutesStuck} minutes`);
});

You might have a problematic SourceFile that hasn't been processed using your updated node resources. Use the following code snippet to remove the file:

JavaScript

//Remove the problematic source file (this assumes it was the first one)

sourceFile = Genai.SourceFile.fetch().objs.first();
sourceFile.remove();

You may need to run C3.app().nodePool("task").update() and restart task nodes to manually trigger chunking again.

Issue: Task Nodes Not Starting

Symptoms: No task nodes in "RUNNING" state

JavaScript

// Node startup diagnostics
var allNodes = C3.app().nodes();
var taskNodes = allNodes.filter((node) => {
  var hasTaskRole = Array.isArray(node.roles) ? node.roles.includes('TASK') : String(node.roles || '').includes('TASK');
  var isApptask = node.id && node.id.includes('apptask');
  return hasTaskRole || isApptask;
});

if (taskNodes.length === 0) {
  console.log('❌ No task nodes exist - run Step 2C to create them');
} else {
  // Get hardware profile from node pool config (not individual nodes)
  var taskNodeConfig = C3.app().nodePool('task').config();
  var hwProfile = taskNodeConfig.hardwareProfile || {};

  taskNodes.forEach((node) => {
    console.log(`Task Node: ${node.id}`);
    console.log(`  State: ${node.state}`);
    console.log(
      `  Hardware: ${hwProfile.cpu || 'N/A'}CPU/${hwProfile.memory || hwProfile.memoryMb || 'N/A'}MB/${hwProfile.gpu || 0}GPU`
    );
  });

  // Check if nodes need restart
  var stoppedNodes = taskNodes.filter((node) => node.state !== 'RUNNING');
  if (stoppedNodes.length > 0) {
    console.log('💡 Some task nodes are not running. Try restarting the node pool:');
    console.log('C3.app().nodePool("task").update()');
  }
}

Performance Benchmarks

Expected Processing Times (per page on one node):

GPU node: 5-10 seconds per page for typical PDF files
CPU node: 10-20 seconds per page for typical PDF files

Quick Fix Checklist

✅ Task nodes running: At least one task node in "RUNNING" state
✅ UniversalChunker configured: deployment specification targets correct node pools
✅ Files queued: Files show "Queued" or "Processing" status
✅ Queue activity: MapReduce queue shows active processing
✅ Hardware appropriate: GPU for high volume, scaled CPU for medium volume

Validation and Next Steps

Success Indicators

Your system is working correctly when you see:

Files transitioning: "Queued" → "Processing" → "Chunked"
MapReduce queue activity during processing
Task nodes in "RUNNING" state
Processing times within expected benchmarks

Ongoing Monitoring

For continuous monitoring, bookmark these commands:

JavaScript

// Quick system health check
function healthCheck() {
  console.log('=== Mew3 Health Check ===');

  // Node status
  var taskNodes = C3.app()
    .nodes()
    .filter((node) => {
      var hasTaskRole = Array.isArray(node.roles)
        ? node.roles.includes('TASK')
        : String(node.roles || '').includes('TASK');
      var isApptask = node.id && node.id.includes('apptask');
      return (hasTaskRole || isApptask) && node.state === 'RUNNING';
    });
  console.log(`✅ Running task nodes: ${taskNodes.length}`);

  // File processing status
  var processing = Genai.SourceFile.fetch({
    filter: 'status.value == "Processing" || status.value == "Queued"',
  }).objs.length;
  console.log(`📄 Files in processing queue: ${processing}`);

  // Queue activity
  var activeQueues = InvalidationQueue.countAll().filter((q) => q.computingActions > 0 || q.pending > 0).length;
  console.log(`⚡ Active processing queues: ${activeQueues}`);

  console.log('===================');
}

// Run health check
healthCheck();

Copy link to this sectionWhen to Use This Guide

Copy link to this sectionPrerequisites

Copy link to this sectionQuick Health Check

Copy link to this sectionStep 1: System Assessment

Copy link to this section1.1 Check Available Nodes and Resources

Copy link to this section1.2 Check Hardware Configuration

Copy link to this section1.3 Check Current Files and Processing Status

Copy link to this section1.4 Assessment Decision Tree

Copy link to this sectionStep 2: Configuration Based on Your System

Copy link to this sectionStep 2A: GPU Configuration (Recommended for High Volume)

Copy link to this sectionStep 2B: CPU Optimization (Good for Low-Medium Volume)

Copy link to this sectionStep 2C: Node Setup (If No Task Nodes Exist)

Copy link to this sectionStep 2D: Validation

Copy link to this sectionStep 3: Start Processing and Monitor Progress

Copy link to this sectionStep 3.1: Monitor Processing

Copy link to this sectionStep 3.2: Monitor Queue Activity

Copy link to this sectionStep 3.3: Track File Status Changes

Copy link to this sectionStep 4: Troubleshooting Common Issues

Copy link to this sectionIssue: No Processing Activity

Copy link to this sectionIssue: Slow Processing Performance

Copy link to this sectionIssue: Files Stuck in "Processing" or "Chunking" Status

Copy link to this sectionIssue: Task Nodes Not Starting

Copy link to this sectionPerformance Benchmarks

Copy link to this sectionQuick Fix Checklist

Copy link to this sectionValidation and Next Steps

Copy link to this sectionSuccess Indicators

Copy link to this sectionOngoing Monitoring

Copy link to this sectionSee Also

When to Use This Guide

Prerequisites

Quick Health Check

Step 1: System Assessment

1.1 Check Available Nodes and Resources

1.2 Check Hardware Configuration

1.3 Check Current Files and Processing Status

1.4 Assessment Decision Tree

Step 2: Configuration Based on Your System

Step 2A: GPU Configuration (Recommended for High Volume)

Step 2B: CPU Optimization (Good for Low-Medium Volume)

Step 2C: Node Setup (If No Task Nodes Exist)

Step 2D: Validation

Step 3: Start Processing and Monitor Progress

Step 3.1: Monitor Processing

Step 3.2: Monitor Queue Activity

Step 3.3: Track File Status Changes

Step 4: Troubleshooting Common Issues

Issue: No Processing Activity

Issue: Slow Processing Performance

Issue: Files Stuck in "Processing" or "Chunking" Status

Issue: Task Nodes Not Starting

Performance Benchmarks

Quick Fix Checklist

Validation and Next Steps

Success Indicators

Ongoing Monitoring

See Also