Feature Engineering and Time Series Analysis on C3 Agentic AI Platform
Data science work often requires complex transformations on source data in order for it to be consumable by machine learning models as well as by humans. In machine learning terminology, a feature is a transformed data value provided as input to a machine learning model. Features are indexed by a subject for which you make a prediction (for example, a wind turbine or a stock trade). If a feature is time series, it is also indexed by a time stamp, indicating the time for which the value is associated. In C3 AI Version 8, the C3 Agentic AI Platform provides the C3 Feature Store, a component to store feature definitions and to manage the associated data.
Feature definitions
Features can be defined using one of two APIs: Pandas and Metrics.
Pandas feature definitions
Pandas is a popular open source Python library used by data scientists to transform feature data. To define a feature using Pandas-style APIs, obtain data frames representative of the source data (using the eval() method on entity types), transform the data using Pandas APIs, and then call upsertFeature on the resulting data. The transformations are automatically generalized and saved to the feature store.
Metrics feature definitions
Metrics are instructions for how to transform data that is modeled as time series, using a powerful expression language that includes a wide range of prebuilt functions (see ExpressionEngine Functions). When using metrics, much of the tedious cleanup and preprocessing of time series data can be handled for you automatically.
To create a feature from a metric definition, you can either call the Feature#fromMetric API or define the feature as seed data. In either case, you can reference an existing metric or define a new metric in the feature definition. See Create Features Using Metrics or Metric Expressions for details on these options.
Choose between Pandas and Metrics for defining features
Use Pandas to define features when:
- You are a data scientist familiar with Pandas APIs.
- You want the flexibility of the Pandas APIs and Python functions in your feature definitions.
Use Metrics to define your features when:
- You are already familiar with Metrics from C3 AI version 7 and would like to continue using them.
- You do not need the flexibility of Pandas and would like to take advantage of normalization. See Normalization Engine for more information.