Virtualization

Data virtualization in the C3 Agentic AI Platform allows you to connect to external data sources and access source data directly without loading data into the platform. This allows you to build C3 AI applications that leverage the investments your organization has made in building out a data warehouse, data lake, or other data infrastructure. When virtualization is enabled, database queries issued by the application are pushed to the source system.

The platform includes dozens of out-of-the-box connectors and a simple framework for integrating new sources to use in an enterprise AI application. Virtualizing your data allows you to:

Take advantage of the encapsulation benefits of the Type system
Keep your data centralized in a unified federated image
Avoid incurring the extra compute and storage costs associated with a traditional ETL

The C3 Agentic AI Platform supports virtualization capabilities for dozens of source systems out of the box, including:

Data lake and data warehousing technologies like Snowflake, AWS Redshift, and Delta Lake/Databricks
RDBMS such as PostgreSQL, and Oracle
NoSQL databases like Apache HBase, Azure CosmosDB, and MongoDB

When to use virtualization

Deciding whether to virtualize or persist a data source depends on various factors to the need of your application, data requirements, and system architecture. Consider virtualizing data when:

The source system has a performant, stable data model that aligns with the needs of the C3 application.
The source system's data model is unlikely to change frequently.
Real-time or near-real-time data access is critical.
The data requires little or no transformation or preprocessing.
The data volume is low, or the query load on the source system is manageable.

Data virtualization is useful when the source system lacks a robust mechanism for tracking data updates, making incremental data ingestion into C3 challenging.

Consider persisting data in C3 when:

The uptime of the C3 application is critical, and your application must be self-reliant.
Your C3 application supports specific use cases or query patterns (for example, certain data science or UI requirements) that are optimized for performance.
Your C3 base application requires a specific data model for scalability, reliability, and performance.
The data volume is large, or the queries are complex and require significant processing power.

The following sections describe some more considerations of using virtualization on the platform.

No duplication of data

Virtualizing data means that you do not have to store multiple copies of the same data across different systems. This can significantly reduce storage costs, since you only reference the data instead of duplicating it.

Data is accessed directly from the source, so any updates made to the source data are immediately reflected in the application without the need for synchronization or replication processes.

By avoiding the complexities associated with data duplication and synchronization, managing data becomes simpler. Virtual Types abstract the intricacies of data storage and provide a straight-forward interface for data access.

Organizations can establish clear governance over data access without needing to manage multiple copies of the same dataset. This can simplify compliance with data protection regulations.

Real-time data access

External Types provide real-time or near-real-time access to data. This is crucial for applications that rely on up-to-date information, such as monitoring dashboards or dynamic reporting systems. Access to real-time data can lead to faster decision-making processes, enhancing operational efficiency and responsiveness to market changes.

Disadvantages of virtualization

There are some limitations to virtualizing your data on the platform. Consider the following sections when building your data pipelines.

No stored calculations

Since External Types often do not store data within the application, any calculations or aggregations must be computed on-the-fly during each query. This can lead to increased computational overhead and slower response times, particularly for complex queries or large datasets.

The absence of pre-computed values means that performance can suffer during data retrieval, especially when multiple calculations are required for the data being accessed. You may consider persisting data or adopting a hybrid approach when managing your data pipelines.

No support for hierarchies

Hierarchies are generally not supported in virtualization or External Types.

Hierarchies require a well-defined structure that outlines relationships between different data entities. Virtualized or External Types, which often reference data from separate systems or sources, may not maintain the necessary structure or constraints to support hierarchical relationships effectively.
Hierarchical structures often depend on the enforcement of referential integrity to ensure that parent-child relationships remain intact. Since External Types do not enforce these relationships, it becomes difficult to manage and validate hierarchies.
Data in external sources can change independently of the C3 AI application. This dynamic nature can lead to inconsistencies in hierarchical relationships, making it challenging to enforce and manage hierarchies reliably.

No support for timed values

Timed values are generally not supported in external types or virtualization within the platform. Virtualization on the platform is largely incompatible with requirements for managing temporal data effectively.

Timed values typically rely on a well-defined and structured data model, which includes aspects like versioning and history tracking. External Types and virtualization do not enforce a strict schema.
Virtualization often focuses on providing real-time access to data without storing historical context. Timed values inherently require historical data management to track changes over time, which virtualization does not provide.
Implementing timed values can introduce performance overhead in terms of storage and processing. Virtualized environments prioritize speed and efficiency in data access, which may conflict with the additional complexity of managing timed values.
Timed values often need to maintain referential integrity with other data points over time. Virtualized Types may not support this level of relational integrity.
For example, TimedRelations and TimedIntervalRelations are not supported for External Types. This may limit the granularity of time-bound analyses for maintenance predictions.

No support for Parametric Types

Parametric Types allow the definition of a type that can take one or more parameters, enabling more flexible and reusable data models. Unfortunately, Parametric Types are generally not supported for External Types.

Worse metric performance

Accessing data from external sources can introduce latency due to network calls or other factors. Latency or performance issues with your external data source can lead to poor performance for processing data in C3.

Create connections to external data

Use the SqlSourceSystem Type, along with a JdbcCredentials Type stored in the application's JdbcStore, to connect to external database systems from your application.

SqlSourceSystem Type – Models the external system that you are connecting to
JdbcCredentials Type - Authorizes the connection to a SqlSourceSystem
JdbcStore Type - Stores a JdbcCredentials securely within an application

Model an external database system

A SqlSourceSystem Type instance models the external database system in an application. For example, if your table MYTABLE lives in an external database system and you have defined an External Entity Type that correctly models the schema of the table, define a SqlSourceSystem with a .json file in the ./metadata/SqlSourceSystem/ folder of your package as follows:

JSON

{
    "name": "My External Database System"
}

Connect to an external database system

JdbcCredentials is used to authorize the connection to an external database system. The platform offers JDBC connectors to many external systems, such as Databricks, Snowflake, or MS SQL Server. For an application to use this credential at runtime, it must be added to the JdbcStore of the application.

For example, to connect to an external Snowflake database, run the following code snippet:

JavaScript

var credentials = JdbcCredentials.fromServerEndpoint("<my_account>.snowflakecomputing.com", -1, DatastoreType.SNOWFLAKE,
    "<table>", null, "<username>", "<password>");
JdbcStore.forName("My External Database System").setCredentials(credentials);
JdbcStore.forName("My External Database System").setExternal();

This code snippet is shown for illustrative purposes. For a production system, store credentials in a vault.

See the DatastoreType Type for a comprehensive list of supported external data stores.

Validate the external database connection

To validate that the credential has been set, run the following:

JavaScript

SqlSourceSystem.forName("My External Database System").ping()

The connection has been configured if the function returns { "reachable": true }.

Use external data in an application

Use the External and SqlSourceCollection Types to model external data in your application after connecting to an external database system:

External Entity Type – Describes the schema of the table or collection to which you are connecting
SqlSourceCollection Type – Specifies a specific table or collection within the SqlSourceSystem

Create an External Entity Type

Entity Types make up the operational data model of an application. When an Entity Type mixes the External Type, this indicates that the data of this type lives in an external database management system. These types still mix Persistable, so all the same APIs for data access and manipulation are still available. At runtime, the database engine of the C3 Agentic AI Platform generates SQL with a syntax that is understood by the source system, and pushes that query to the external system.

External Entity Types must declare a schema that matches the schema of the external database system using the schema name keywords. For example, imagine that you have an external SQL database table MYTABLE in a schema called MYSCHEMA with the following columns:

ID: varchar(55)
FIELD_1: datetime
FIELD_2: varchar(55)
FIELD_3: int

To model this external database table in an application, define an External Entity Type with a .c3typ file in the ./src/ directory of your package as follows:

Type

entity type MyExternalType mixes External, NoSystemCols schema name "[MYSCHEMA].[MYTABLE]" {
    id:     ~ schema name "ID"
    field1: datetime schema name "FIELD_1"
    field2: string schema name "FIELD_2"
    field3: int schema name "FIELD_3"
}

External Types usually also mix NoSystemCols. This specifies that the Entity Type does not include the system-generated columns that are present on all Entity Types whose data are managed by the C3 Agentic AI Platform.

External Type data fields must be able to be mapped to the primitive data types of the C3 Agentic AI Platform as follows:

External Database Column Types	Type System Primitive Types
TINYINT,SMALLINT,INTEGER	Integer
BIGINT	LongInt
FLOAT, REAL	Float
Double	Double
NUMERIC	LongInt / Decimal
DECIMAL	Decimal
CHAR,VARCHAR, LONGVARCHAR, NCHAR, NVARCHAR, LONGNVARCHAR	String
DATE, TIMESTAMP	DateTime
BINARY, VARBINARY, LONGVARBINARY	Binary
BLOB	Binary
CLOB, NCLOB	String
BOOLEAN	Boolean
TIMESTAMP_WITH_TIMEZONE	DateTime
BIT	Boolean

Other external database types, including the following, are not supported at this time:

TIME
NULL
JAVA_OBJECT
DISTINCT
STRUCT
ARRAY
REF
DATALINK
SQLXML
REF_CURSOR
TIME_WITH_TIMEZONE

A SqlSourceCollection Type instance links your External Type to the SqlSourceSystem containing your table.

You can define a SqlSourceCollection with a .json file in the ./metadata/SqlSourceCollection/ of your package:

JSON

{
    "name": "MyExternalType",
    "source": "MyExternalType",
    "sourceSystem": {"name": "My External Database System"}
}

Infer source schemas

The SqlSourceCollection#inferSourceType method can be used to simplify creating External Entity Types. For instance, the following code snippet generates the Entity Type definition, which can be downloaded and saved as a .c3typ file in the ./src/ folder of your package, which is inferred from the external database table:

JavaScript

var name = "MyExternalType";
var typeMeta = SqlSourceCollection.forName(name).inferSourceType2();
c3DL(typeMeta.toString(), "plain/text", name + ".c3typ")

This downloads a MyExternalType.c3typ file from the browser, with the following definition:

Type

entity type MyExternalType mixes External, NoSystemCols schema name "[MYSCHEMA].[MYTABLE]" {
    id:     string schema name "ID"
    field1: datetime schema name "FIELD_1"
    field2: string schema name "FIELD_2"
    field3: int schema name "FIELD_3"
}

When defining External Types, the id field must be mapped to a column in the table, or define a composite key. A composite key is when two or more columns are used to uniquely identify each row a table.

Composite keys and external Types

A composite key is a type of key that consists of two or more attributes (or columns) used together to uniquely identify a record in a table. When none of the individual attributes is sufficient by itself to uniquely identify records, a combination of them can be used.

For example, consider an external data source that contains a table, student.records, which stores university student enrollment records:

`student_id`	`course_id`	`term`	`grade`
001	6.1903	Fall	A
001	18.05	Fall	B
002	18.05	Fall	B
003	18.06	Spring	B
003	6.1020	Fall	A

Neither the student_id nor the course_id alone would be sufficient to uniquely identify a record since a student can be enrolled in multiple courses and a course can have multiple students. However, the combination of student_id and course_id would be unique for every record, making them suitable for a composite key.

CompositeKey definition:

Type

type MyKey mixes CompositeKey {
    student_id : string
    course_id : string
}

Type definition (arbitrary example):

Type

entity type MyCompKey mixes MyKey, External, NoSystemCols schema name 'default.student.records' {
    student_id : !string
    course_id : !string
    term : string
    grade: string
}

Run the following command in the C3 AI Console to view the results:

JavaScript

c3Grid(MyCompKey.fetch());

Notice in the table below the id field is a concatenation of the student_id and course_id fields.

	`id`	`student_id`	`course_id`	`term`	`grade`
0	001#18.05	001	18.05	Fall	B
1	001#6.1903	001	6.1903	Fall	A
2	002#18.05	002	18.05	Fall	B
3	003#18.06	003	18.06	Spring	B
4	003#6.1020	003	6.1020	Fall	A

Copy link to this sectionWhen to use virtualization

Copy link to this sectionNo duplication of data

Copy link to this sectionReal-time data access

Copy link to this sectionDisadvantages of virtualization

Copy link to this sectionNo stored calculations

Copy link to this sectionNo support for hierarchies

Copy link to this sectionNo support for timed values

Copy link to this sectionNo support for Parametric Types

Copy link to this sectionWorse metric performance

Copy link to this sectionCreate connections to external data

Copy link to this sectionModel an external database system

Copy link to this sectionConnect to an external database system

Copy link to this sectionValidate the external database connection

Copy link to this sectionUse external data in an application

Copy link to this sectionCreate an External Entity Type

Copy link to this sectionInfer source schemas

Copy link to this sectionComposite keys and external Types

Copy link to this sectionSee also

When to use virtualization

No duplication of data

Real-time data access

Disadvantages of virtualization

No stored calculations

No support for hierarchies

No support for timed values

No support for Parametric Types

Worse metric performance

Create connections to external data

Model an external database system

Connect to an external database system

Validate the external database connection

Use external data in an application

Create an External Entity Type

Infer source schemas

Composite keys and external Types

See also