Use Partial Indexes in Vector Store

In data-driven applications, efficiently managing high-dimensional vector data is crucial. pgvector, a PostgreSQL extension, integrates vector similarity search into your database, allowing you to store and index vector embeddings alongside traditional data.

As datasets grow, indexing every vector can become resource-intensive. Partial indexes in PostgreSQL help by creating indexes on a subset of data that meets specific conditions, rather than the entire table. This optimization with pgvector offers several benefits:

Reduced Storage Overhead: Index only the most relevant vectors, saving disk space.
Improved Query Speed: Accelerate searches by focusing on pertinent entries.
Enhanced Efficiency: Concentrate resources on data that impacts performance the most.

Partial indexes enable more efficient data retrieval and management, especially for large-scale vector data where only a portion is relevant for specific queries. Implementing partial indexing with pgvector optimizes database operations and enhances application performance.

Prerequisites for indexing

C3 Server Version — Ensure you are using version 8.7 or higher.
JDBC Store — The application's backing JDBC store must be PostgreSQL.
pgvector Extension — Verify that the pgvector extension is installed.

To check the installed version of the pgvector extension, use the following command:

JavaScript

DbAdmin.inst().getPgVectorVersion()

Create a Type with a vector store partial index

Create a Type with a vector store partial index using the @db annotation to optimize data indexing and querying.

Define the entity Type

Use the @db annotation to specify the index.
Include a filter condition to create a partial index on a subset of data.

Example Type definition

Type

@db(index = [{fields:[{name:emb, opClass:L2}], filter: "title == 'abcABC123'"}])
entity type VectorStorePartialIndex { 
   * The id of the doc. 
   */ 
  docid: string 
  /** 
   * The title of the doc. 
   */ 
  title: string 
  /** 
   * The text of the doc. 
   */ 
  text: string 
  /** 
   * The vector embedding. 
   */ 
  @vector(dimension=5) 
  emb: string 
}

Where:

Index Field: emb
Filter Condition: title == 'abcABC123'

Filter explanation

The filter is a string serialized Filter.c3typ that supports expression engine functions.

Ensure the filter used in the index is similar to the one used in queries for maximum efficiency.

Create multiple partial indexes

You can create multiple partial indexes on the same field with different filters.

Example:

Type

@db(index = [{fields:[{name:emb, opClass:L2}], filter: "title == 'abcABC123'"},
             {fields:[{name:emb, opClass:L2}], filter: "title == 'xyzXYZ789'"}])

By following these steps, you can create a type with a vector store partial index, optimizing performance by indexing only relevant portions of your data based on specific conditions.

Upsert the collection

Insert data into the type and create the necessary tables and indexes in the PostgreSQL database.

Prepare Your Types

Ensure that the types are defined and ready in your application.

Insert Data

You can directly insert data into the type using standard data insertion methods.

Use upsertCollection():

Alternatively, use the <type_name>.upsertCollection() method to insert data and simultaneously create the required tables and indexes in the PostgreSQL database.

The following method is particularly useful for bulk operations, as it handles both data insertion and database schema updates in one step.

Example:

Type

<TypeName>.createBatch([ 
  {docid: "1", title: "abcABC123", text: "Sample text", emb: "vector1"}, 
  {docid: "2", title: "xyzXYZ789", text: "Another sample text", emb: "vector2"} 
])

(Optional) Verify the index creation

You can utilize the DbAdmin#getTableIndexes(<table_name>) API to view indexes created for your table.

Use fetch/eval/evaluate to query results

Utilize the partial indexes created on the type by specifying the same filter used during index creation, along with your query vector.

Steps:

Specify the Filter — Use the same filter condition that was used when creating the partial index. This ensures that the query leverages the partial index for optimized performance.
Query Vector — Include the query vector in your filter to perform the similarity search.
Example Query — Use the fetch method to retrieve results based on the filter and query vector.
Type
```
<type_name>.fetch({ filter: "vectorDistance('emb','[0,1,1,1,2]') < 2 && title == 'abcABC123'" }); 
```

Here's an explanation for the method and filter conditions.

Method:
- fetch: This method is used to retrieve data from the database based on specified filter conditions.
Filter condition:
- vectorDistance('emb','[0,1,1,1,2]') < 2: This part of the filter calculates the distance between the emb field and the query vector [0,1,1,1,2], and retrieves records where this distance is less than 2.
- && title == 'abcABC123': This part of the filter ensures that only records with the title 'abcABC123' are considered.

By following these steps, you can efficiently query your data using the partial indexes created on the type. This approach optimizes query performance by narrowing down the dataset to the most relevant entries based on the specified conditions.

Copy link to this sectionPrerequisites for indexing

Copy link to this sectionCreate a Type with a vector store partial index

Copy link to this sectionDefine the entity Type

Copy link to this sectionExample Type definition

Copy link to this sectionFilter explanation

Copy link to this sectionCreate multiple partial indexes

Copy link to this sectionUpsert the collection

Copy link to this sectionPrepare Your Types

Copy link to this sectionInsert Data

Copy link to this sectionUse upsertCollection():

Copy link to this section(Optional) Verify the index creation

Copy link to this sectionUse fetch/eval/evaluate to query results

Prerequisites for indexing

Create a Type with a vector store partial index

Define the entity Type

Example Type definition

Filter explanation

Create multiple partial indexes

Upsert the collection

Prepare Your Types

Insert Data

Use upsertCollection():

(Optional) Verify the index creation

Use fetch/eval/evaluate to query results