C3 AI Documentation Home

Normalization Engine

The C3 Agentic AI Platform Normalization Engine is used to set up normalization processes on your data after modeling your time series data. See Modeling Time Series Data and Identify Time Series for more information about modeling and identifying time series data.

Set and apply normalization techniques

Treatment (C3 AI Type: AggOp for aggregate operation) and units can be applied during normalization. Normalization offers several configurable options. As you set up your Types, you want to think about:

  1. Treatment: How do you want the values to be normalized. See the section on Time Series Data Treatments for the various treatments that can be applied during the normalization process and can be set by providing aggregate operation on the annotation on the time series field.
  2. Unit: Determine where the unit for normalization can be determined from. It is valid to not select a unit for normalization.

Examples

Depending on the nature of the data, slightly different steps are taken for normalization. These examples walk you through the initial steps for normalization and how, after Types are created and provisioned, to see normalization in action.

Model point data

For point data, begin by defining a header Type that mixes in TimedDataHeader.

This header Type should provide the data point Type that it is connected to, so that the normalization engine knows that the actual time series data points should be retrieved from this data point Type. See below example for weather data from a smart thermometer:

Type
 entity type WeatherHeader mixes TimedDataHeader<WeatherData> schema name 'WTHRHDR' {
   /**
    * Unit of temperature
    */
   tempUnit: !string
 }

Next, define the IntervalDataPoint Type that stores the data points:

Type
@db(datastore='kv')
entity type WeatherData mixes IntervalDataPoint<WeatherHeader> schema name 'WTHRDT' {
    /**
     * Time series field to track actual temperature
     */
    @ts(treatment='rate', unitPath="parent.tempUnit")
    temp: !double
}

In this example, the data points are stored in Cassandra. The temperature reading temp is the only field normalized using rate. Note the datapoint Type mixes in TimedDataPoint, parameterized by the header Type just defined. The unit for normalization unitPath comes from this header Type.

Model interval data

For interval data, like daily utility bills, create a time series header Type that can hold meta information about how this time series needs to be normalized. All the fields that can instruct the normalization engine to apply specific rules come from the IntervalDataHeader Type.

Let the header Type be called UtilityBillSeries that mixes in IntervalDataHeader. This header Type should provide the data point Type that it is connected to, so that the normalization engine knows that the actual time series data points should be retrieved from this data point Type.

Type
entity type UtilityBillSeries mixes IntervalDataHeader<UtilityBill> schema name 'UBS' {
    /**
     * Unit of currency
     */
    currencyUnit : !string
}

Next, define the data point Type that can store the actual time series data points. This Type also should be linked to the IntervalDataHeader it is associated with.

Let the data point Type be called UtilityBill (referenced above from the series header) that mixes in IntervalDataPoint.

Type
@db(datastore='kv')
entity type UtilityBill mixes IntervalDataPoint<UtilityBillSeries> schema name 'UTLTYBLL' {
  /**
   * Time series field to track actual usagae
   */
  @ts(treatment='integral', unitPath="parent.currencyUnit")
  usage : !double
}

This Type indicates that the data points are going to be stored in Cassandra and our time series field is called usage.

The integral treatment can be applied to it. The unit for this data can be obtained from the header field currencyUnit, which was defined on UtilityBillSeries, above.

Generate data

After these Types are saved and provisioned, generate sample raw data with the following script:

JavaScript
 function createMeasurement () {
    var parentId = "customer-1";
    UtilityBillSeries.upsert(UtilityBillSeries.make({
        id: parentId, currencyUnit: {
            id:"dollars"
        }
    }));
    var no = 365 * 2;
    measurements = [];
    var startDate = Date.deserialize("2015-01-01T00:00:00");
    for (var i = 0; i < no; i++) {
        var dpStart  = new Date(startDate.getTime() + (i * 24 * 60 * 60 * 1000));
        var dpEnd = new Date(dpStart.getTime() + (24 * 60 * 60 * 1000));
        measurements.push(UtilityBill.make({
            start: dpStart, 
            end: dpEnd, 
            usage: (Math.floor(Math.random()*(10-5+1))+5), parent: {
                id: parentId
            }
        }));
   }

   if (measurements.length > 0) {
       UtilityBill.upsertBatch(measurements);
   }
   return parentId;
 }

 createMeasurement();

Normalize data

With the script for raw data created, the following command in the Chrome DevTools Console can normalize the time series manually:

JavaScript
UtilityBillSeries.normalizeTimeseries(UtilityBillSeries.make({
    id:"customer-1"
}))

The data can automatically be normalized on read if it has not been previously manually normalized. Even assuming this time series is being read directly instead of manually normalizing, the output is still the normalized values.

For example:

JavaScript
var ts = UtilityBillSeries.tsEval({
    projection:"sum(normalized.data.usage)",
    start:"2015-01-01",
    end:"2017-01-01",
    grain:"DAY",
    filter:"id == 'customer-1'"
});

c3Viz(ts);

Use normalization

Time series data in the C3 Agentic AI Platform is automatically normalized in the background:

  • On read if it hasn't previously been normalized or has been marked invalid due to new data arrival
  • Incrementally if the series has been marked to be normalized using an Environment config.

Although time series are automatically normalized, you may see a performance hit while running high throughput analytics. In this case, you could normalize manually by using various API calls. See Triggering Normalization section for more information.

See also

Was this page helpful?