C3 AI Documentation Home

Embedding Models for the Vector Store

In C3 Generative AI, documents are parsed, chunked, and embedded into a vector store for additional retrieval steps. For information on multimodal parsing and chunking, see Multimodal Parsing.

Depending on your model architecture, training data, intended use cases, and embedding performance and quality, you may want to change the embedder in your application. C3 Generative AI supports any embedder from Hugging Face including mixedbread-ai/mxbai-embed-large-v1 and any LLM-based embedder such as text-embedding-3-large from OpenAI. For the benefits and costs of each embedder, read through the respective documentation of each model's source.

How to change the default embedder

The embedder is implemented in the Genai.Embedder Type which has a method called getEmbedder. This method takes a specification in Genai.Embedder.Spec

You should use the fields in the Embedder.Spec Type to customize your embedder.

By default, the platform uses a cloud-specific embedding client seeded at the cluster level through GenaiCore.Llm.Embedding.Client. If no cluster-level client is configured, the application falls back to the e5 transformer model (intfloat/multilingual-e5-large-instruct).

You can override the default with the following approaches:

HuggingFace Embedders

You can use the direct model names from HuggingFace or the predefined model enums in the application.

To use direct model names, use the following code:

Python
huggingface_embedder_spec = c3.Genai.Embedder.Spec(
    embedderModelName='mixedbread-ai/mxbai-embed-large-v1',
    embedderType='GenaiCore.Embedder.Hf',
).withDefaults()

embedder = c3.Genai.Embedder.getEmbedder(huggingface_embedder_spec)

LLM-Based Embedders

To use an LLM-based embedder, you should change the embedder type and the provider type. In the following code, you specify the embedder as an Azure OpenAI embedder.

Python
openai_embedder_spec = c3.Genai.Embedder.Spec(
    embedderModelName='text-embedding-3-large',
    embedderType='GenaiCore.Embedder.Llm',
    providerType='GenaiCore.Llm.AzureOpenAi'
).withDefaults()

openai_embedder = c3.Genai.Embedder.getEmbedder(openai_embedder_spec)

Other Supported Providers

AWS Bedrock and Google Vertex AI embedders are also available. Use providerType='GenaiCore.Llm.Bedrock' or providerType='GenaiCore.Llm.VertexAi' respectively.

Was this page helpful?