Use Partial Indexes in Vector Store
In data-driven applications, efficiently managing high-dimensional vector data is crucial. pgvector, a PostgreSQL extension, integrates vector similarity search into your database, allowing you to store and index vector embeddings alongside traditional data.
As datasets grow, indexing every vector can become resource-intensive. Partial indexes in PostgreSQL help by creating indexes on a subset of data that meets specific conditions, rather than the entire table. This optimization with pgvector offers several benefits:
- Reduced Storage Overhead: Index only the most relevant vectors, saving disk space.
- Improved Query Speed: Accelerate searches by focusing on pertinent entries.
- Enhanced Efficiency: Concentrate resources on data that impacts performance the most.
Partial indexes enable more efficient data retrieval and management, especially for large-scale vector data where only a portion is relevant for specific queries. Implementing partial indexing with pgvector optimizes database operations and enhances application performance.
Prerequisites for indexing
- C3 Server Version — Ensure you are using version 8.7 or higher.
- JDBC Store — The application's backing JDBC store must be PostgreSQL.
- pgvector Extension — Verify that the pgvector extension is installed.
To check the installed version of the pgvector extension, use the following command:
DbAdmin.inst().getPgVectorVersion()Create a Type with a vector store partial index
Create a Type with a vector store partial index using the @db annotation to optimize data indexing and querying.
Define the entity Type
- Use the
@dbannotation to specify the index. - Include a filter condition to create a partial index on a subset of data.
Example Type definition
@db(index = [{fields:[{name:emb, opClass:L2}], filter: "title == 'abcABC123'"}])
entity type VectorStorePartialIndex {
* The id of the doc.
*/
docid: string
/**
* The title of the doc.
*/
title: string
/**
* The text of the doc.
*/
text: string
/**
* The vector embedding.
*/
@vector(dimension=5)
emb: string
}Where:
- Index Field:
emb - Filter Condition:
title == 'abcABC123'
Filter explanation
The filter is a string serialized Filter.c3typ that supports expression engine functions.
Ensure the filter used in the index is similar to the one used in queries for maximum efficiency.
Create multiple partial indexes
You can create multiple partial indexes on the same field with different filters.
Example:
@db(index = [{fields:[{name:emb, opClass:L2}], filter: "title == 'abcABC123'"},
{fields:[{name:emb, opClass:L2}], filter: "title == 'xyzXYZ789'"}])By following these steps, you can create a type with a vector store partial index, optimizing performance by indexing only relevant portions of your data based on specific conditions.
Upsert the collection
Insert data into the type and create the necessary tables and indexes in the PostgreSQL database.
Prepare Your Types
Ensure that the types are defined and ready in your application.
Insert Data
You can directly insert data into the type using standard data insertion methods.
Use upsertCollection():
Alternatively, use the <type_name>.upsertCollection() method to insert data and simultaneously create the required tables and indexes in the PostgreSQL database.
The following method is particularly useful for bulk operations, as it handles both data insertion and database schema updates in one step.
Example:
<TypeName>.createBatch([
{docid: "1", title: "abcABC123", text: "Sample text", emb: "vector1"},
{docid: "2", title: "xyzXYZ789", text: "Another sample text", emb: "vector2"}
])(Optional) Verify the index creation
You can utilize the DbAdmin#getTableIndexes(<table_name>) API to view indexes created for your table.
Use fetch/eval/evaluate to query results
Utilize the partial indexes created on the type by specifying the same filter used during index creation, along with your query vector.
Steps:
Specify the Filter — Use the same filter condition that was used when creating the partial index. This ensures that the query leverages the partial index for optimized performance.
Query Vector — Include the query vector in your filter to perform the similarity search.
Example Query — Use the fetch method to retrieve results based on the filter and query vector.
Type<type_name>.fetch({ filter: "vectorDistance('emb','[0,1,1,1,2]') < 2 && title == 'abcABC123'" });
Here's an explanation for the method and filter conditions.
- Method:
fetch: This method is used to retrieve data from the database based on specified filter conditions.
- Filter condition:
vectorDistance('emb','[0,1,1,1,2]') < 2: This part of the filter calculates the distance between the emb field and the query vector[0,1,1,1,2], and retrieves records where this distance is less than 2.&& title == 'abcABC123': This part of the filter ensures that only records with the title 'abcABC123' are considered.
By following these steps, you can efficiently query your data using the partial indexes created on the type. This approach optimizes query performance by narrowing down the dataset to the most relevant entries based on the specified conditions.