Feature Set Snapshot
Data scientists and data engineers can find inconsistent values for the same Feature Set for the same timestamp. This behavior can result from the underlying source data (like database entries) that feed into a Feature Set changing. Also, the Feature Set itself could be removed or modified.
C3 AI provides you with solutions to persist a snapshot of a certain Feature Set data at a given point in time along with metadata. This is most useful when you need to inspect the exact training data that was used to train a specific ML model and reuse the same training data.
The C3 AI Feature Set Snapshot provides the following features:
Immutability — Snapshots cannot be altered. No subjects can be added to the snapshot. No subject's data can be updated.
Reproducibility — You can reproduce the Feature Set when it was materialized to the Feature Store.
Auditability — You can inspect the snapshot metadata to know what Feature Set it used, what data it used, and what subjects it used.
Use the Feature Set APIs
This section covers how to use the Feature.Set APIs.
After you create or get a featureSet and subjectFilter/subjects, call featureSet.createSnapshot.
job = featureSet.createSnapshot(
subjectFilter="id == 'TURBINE-1'", batchSize=100,
snapshotId='windTurbine-snapshot-1')Note: If you want to run the createSnapshotJob immediately, you should call job.waitForCompletion().
To read from the created snapshot, specify the snapshotId in the evalFeatureSetBatch.
WindTurbine.evalFeatureSetBatch(subjectFilter=filter, featureSet=fs1,
start=”2021-01-01”, end=”2021-06-01”, snapshotId=”snapshot1”)To delete a snapshot, use the deleteSnapshot API.
featureSet.deleteSnapshot(snapshotId=existingSnapshotId, confirm=True)Additional notes
snapshotIdis unique.You cannot call
createSnapshotwith the samesnapshotId. The snapshot is immutable once created.You cannot update an existing snapshot. You must first delete the snapshot, and create a new snapshot with the updated snapshot metadata.
snapshotIdis different from theidof the snapshot. ThesnapshotIdis specified by the user, and theidis created in runtime by prepending the givensnapshotIdwith thesubjectType. Theidfield differentiates cases where the samesnapshotIdis used for differentsubjTypes.