Integrate an Embedder with Vector Store

Implement a vector store embedder to enable similarity search in the C3 Agentic AI Platform. The C3 Agentic AI Platform provides a framework to implement vector embeddings using Deep Java Library (DJL) so you can perform text similarity searches.

A similarity search finds items in a dataset that are most similar to a given query item. Instead of matching exact keywords or values, it compares vector embeddings. Vector embeddings are numerical representations of data that allow similarity searches to find semantically close items. Similarity searches allow applications to perform quick, accurate, and scalable searches over large datasets.

For example, instead of searching for the exact phrase "car repair," an embedder converts the contents into embeddings, which are numerical vectors that have meaning. Embeddings allow a similarity search to find documents that contain content about "auto maintenance," "vehicle fixing," or "automobile service" because their vector representations are numerically close even though the actual words are different.

The vector store embedder framework uses the following Types:

Entity.Embedder: The base interface for all embedders in the platform
Djl.Engine: Connects the platform to DJL
Djl.Embedder: An example Type that extends Entity.Embedder to implement embedding functionality
VectorWithEmbedder: An example Type that enables automatic embedding
Expr.SimilaritySpec: Defines the configuration and parameters for performing similarity searches on vector embeddings.

The platform uses the preconfigured embedder all-MiniLM-L6-v2 with the ONNX and PyTorch runtimes.

To learn more about the embedder, see the HuggingFace page all-MiniLM-L6-v2.

To learn more about vector store in the C3 Agentic AI Platform, see Vector Store.

The platform offers two ways to implement an embedder in your C3 AI application:

Implement an embedder: Store a serialized URL where the embedder can load, and specify the engine that runs the embedder model.
Enable automatic embedding: Use the @vector annotation to generate a vector embedding and refer to the preconfigured embedder.

Implement an embedder

To implement embedding functionality in an entity Type, extend the Entity.Embedder Type. Here is an example Type Djl.Embedder:

Type

/**
 * Embedder implementation using Deep Java Library (DJL).
 */
entity type Djl.Embedder extends Entity.Embedder type key 'DJL' {
  /**
   * Embedder model url
   */
  url: !string serialized Url
  /**
   * Engine for the embedder model
   */
  engine: !string enum Djl.Engine
}

This Type stores the serialized URL where the embedder can load and specifies the engine responsible for running the embedder model. The system selects the engine from the Djl.Engine enum Type.

To utilize this Type and the embedding models from DJL, declare a package dependency on deepJavaLib. The Entity.Embedder Type and example Djl.Embedder Type are already pre-seeded in the deepJavaLib package.

Enable automatic embedding

Configure automatic embedding using the @vector annotation with the sourceField and embedder parameters. Here is an example Type VectorWithEmbedder:

Type

entity type VectorWithEmbedder {
  
  /**
   * The source text
   */
  textField: string
  
  /**
   * Automatically generated vector embedding
   */
  @vector(sourceField='textField', embedder='all-MiniLM-L6-v2')
  emb: [!float]
}

The sourceField='textField' parameter generates a vector embedding for the FieldPath of a text field. The embedder='all-MiniLM-L6-v2' parameter refers to the preconfigured embedder in the platform.

Perform a text similarity search

After you integrate an embedder or configure automatic embedding, you can perform a text similarity search by using the similarity expression engine function.

The function takes a string serialized FieldPath to the text field, the query text, and an optional Expr.SimilaritySpec. Here is an example code snippet that demonstrates how to use the similarity function:

JavaScript

// Consider following data exists in the VectorWithEmbedder:
//
// "machine learning is amazing",
// "deep learning and AI",
// "natural language processing with transformers",
// "computer vision with convolutional neural networks",
// "AI in healthcare is growing",
// "machine learning in finance",
// "chatbots using natural language",
// "transformers are powerful models",
// "finance and data science",
// "neural networks and deep learning"

// Define a similarity calculation between the textField and query string.
similarityExpr = "similarity(textField, 'machine learning applications')"

// Define the projection to determine which fields and calculations to return in the query result.
projection = "id, textField, " + similarityExpr;
// Build the evaluation specification using the projection, similarity score order, and result limit.
spec = EvaluateSpec.builder().projection(projection).order(similarityExpr).limit(2L).build();
// Run the similarity search to retrieve items most similar to the query string based on textField.
res = VectorWithEmbedder.evaluate(spec);

// res:
//
// "machine learning is amazing",
// "machine learning in finance"

This code performs a similarity search on text items stored in the VectorWithEmbedder Type.

Copy link to this sectionImplement an embedder

Copy link to this sectionEnable automatic embedding

Copy link to this sectionPerform a text similarity search

Implement an embedder

Enable automatic embedding

Perform a text similarity search