Monitor Applications and Services

Monitor the health and activity of cloud services with the C3 AI Studio admin monitoring dashboards.

Based on the services that your platform uses, the C3 AI Studio may provide following dashboards:

Cassandra Health monitors the health and performance of Cassandra databases.
Kubernetes Cluster monitors health, performance, and resource utilization for your cluster.
PostgreSQL Health monitors the health and performance of PostgreSQL databases.

Cassandra Health dashboard

Monitor the health and performance of Cassandra running in a Kubernetes environment within your C3 AI Studio instance.

Here is how you can interpret the Cassandra Health dashboard metrics:

Metric	Usage
Node Status (Count)	Ensure all Cassandra nodes are active and running.
CPU Utilization (%)	Track CPU utilization to avoid bottlenecks.
Memory Utilization (GiB)	Track memory utilization to avoid bottlenecks.
Disk Space Utilization (%)	Monitor storage consumption to prevent capacity issues.
Read Latency at 99th Percentile (µ)	Measure read latency to detect and mitigate performance issues. 99% of read operations should complete within the specified latency value, while the remaining 1% might take longer.
Write Latency at 99th Percentile (µ)	Measure write latency to detect and mitigate performance issues. 99% of write operations should complete within the specified latency value, while the remaining 1% might take longer.
Pending Compactions (Count)	Compaction merges multiple SSTables to consolidate data copies, improve read performance, and reclaim disk space. Monitor compaction processes to maintain database efficiency.
Pending Tasks (Count)	Pending tasks, such as compaction or read repair, are in queue and await system resource execution. Identify and resolve backlogs that could impact system performance.
Garbage Collection Time (ms)	Garbage collection prevents memory leaks by reclaiming unused memory. Monitor to maintain memory efficiency.

Monitor Kubernetes and C3 AI node pools, and Kubernetes cluster resource metrics.

Here is how you can interpret the Kubernetes Cluster dashboard metrics:

Metric	Usage
Nodes per Kubernetes nodepool (K8s)	Identify excess resource usage and trace bottlenecks for Kubernetes node pools.
Nodes per C3 nodepool (C3)	Identify excess resource usage and trace bottlenecks C3 AI node pools.
Requests / Provision (CPU)	Detect bottlenecks in CPU and investigate performance degradation during peak usage.
Requests / Provision (Memory)	Detect bottlenecks in memory and investigate performance degradation during peak usage.

Monitor PostgreSQL database health.

Here is how you can interpret the PostgreSQL Health dashboard metrics:

Metric	Usage
CPU Usage (Instance)	Detect bottlenecks in CPU and investigate performance degradation during peak usage.
Memory Usage (Instance)	Detect bottlenecks in memory and investigate performance degradation during peak usage.
Total Connection (Databases)	Identify contention issues and monitor network patterns to detect security vulnerabilities.
Max Connection (Databases)	Avoid max connections, detect performance issues, and plan infrastructure capacity.
Deadlocks (Databases)	Investigate high contention scenarios and optimize transaction isolation levels.
Dead Tuple Ratio (Databases)	A dead tuple is a deleted or updated row that has not been removed from the database. Investigate a high dead tuple ratio to prevent table bloat, excess resource consumption, index fragmentation, and performance issues.
Disk Usage (Databases)	Investigate peak usages, avoid excessive usage, and identify trends for scaling and optimization.
Total Autovacuum (Databases)	Autovacuum is a process that reclaims space from dead tuples. Identify and address low or inconsistent autovacuum rates to prevent bottlenecks, performance degradation, and excess resource consumption.

Have the StudioAdmin role.

Here is how to view these dashboards:

In C3 AI Studio, navigate to the Admin page.
Select Monitoring.
Choose the dashboard you want to view from the Selected Dashboard dropdown menu.

The Admin Monitoring page displays panels that contain metrics and data about the service.

You can further navigate the Admin Monitoring page:

To view more context about the data, scroll over the tool tip icon on the data panel.
To select a time range or filter the data, use the Filter panel.
To set the auto refresh interval, select from the Auto Refresh Interval dropdown menu.
To view more information about services and their compute metrics, select Compute Config.

If any of the metrics concern you, contact your system administrator to investigate or troubleshoot.