Configure RAG Tool

Agents in C3 Generative AI can run unstructured queries with the built-in retrieval augmented generation (RAG) tool called the RAG unified tool. The RAG tool is fully modular. This tool allows you to customize:

Query rewriting
Retriever behavior
Reranker configuration
Input messages structure
Question answering behavior

The RAG unified tool

The rag_unified tool takes in a query as a string and returns a string. You must specify the retriever id, the message builder, and the question answering behavior. A tool has specific initKwargs that define its initialization, call behavior, and cleanup.

The rag_unified tool has an initialize method with specific configurations within its initKwargs:

query_rewriting_args (optional):
- Rewrites the query for retrieval and/or QA.
retrieval_args (required):
- Fetches top relevant passages using semantic, keyword, and metadata signals.
reranker_args (optional):
- Reorders the retrieved passages using LLM-based, Cross Encoder or custom scoring.
message_builder_args (required):
- Builds structured LLM messages using selected passages.
query_answering_args (required):
- Produces the final answer from the LLM based solely on retrieved context.

An example of the RAG tool configuration

The agent will follow the steps of rewriting, retrieving, reranking, building the messages, and answering the question. However, rewriting the query and reranking the retrieved passages is optional. The steps can be modified in the tool configuration params:

Python

initKwargs = {
    "query_rewriting_args": {
        ...
    }
    "retrieval_args": {
        ...
    },
    "reranker_args": {
        ...
    }
    "message_builder_args": {
        ...
    },
    "query_answering_args": {
        ...
    }
}

rag_unified = c3.Genai.Agent.Dynamic.Tool.forId("rag_unified")
rag_unified = rag_unified.withInitKwargs(initKwargs).merge(mergeInclude="initKwargs", returnInclude="this")

# Initialize and ask a test question
rag_unified.initialize()
answer = rag_unified.call("What is the first day of instruction?")
print(answer)

In the optional query_rewriting_args field, you can specify how the agent will rewrite the query. Only enable or rely on query rewriting if you have a well-curated set of accurate, domain-specific few-shot examples. This is critical to maintaining the quality and precision of rewritten queries.

query_rewriting_llm_client_config_name: Can be any LLM of your choice.
query_rewriting_llm_options: Best left as a safe default.
query_rewriting_num_fewshot_examples: Number of few shot examples () to give to the LLM. Setting the number of few-shot examples to 10 can help the model rewrite queries more accurately by learning from more examples, but it might also make the system slower and more expensive if the examples aren’t clear or consistent. The few shots are retrieved from Genai.FewShotExample.Context and have examples of sample queries and context.
query_rewriting_glossary: You can also set a glossary that the LLM will use to match terms and rewrite the query which is useful for domain-specific terminology.
queryRewritingLambdaStr: A lambda function () to write queries in a specific, custom way. If it is enabled the code will use the lambda instead of the llm rewriting. The glossary will still be applied.
queryRewritingLambdaArgs: Arguments for the optional lambda function.
query_rewriting_prompt_id: The ID of the system prompt used for query rewriting. Customize this if you have a specific prompt template. Otherwise, use the existing default.
query_rewriting_hypothetical_queries_vs_id: Vector store ID for hypothetical query expansion. Retrieves similar queries to improve rewriting.
query_rewriting_num_hypothetical_queries: Number of similar hypothetical queries to retrieve for context (default: 3).

Python

    "query_rewriting_args": {
        "query_rewriting_llm_client_config_name": "default-completions", # or any llm of your choice
        "query_rewriting_llm_options": {}, # safe default
        "query_rewriting_num_fewshot_examples": 2, # number of few shots the llm will consider
        "query_rewriting_glossary": {'DS': 'Data Science'}, # a glossary (vocabulary) that the llm will match to rewrite the query
        #"queryRewritingLambdaStr": "",  # Optional, safe default
        "queryRewritingLambdaArgs": {}, # safe default
        "query_rewriting_prompt_id": "default_query_rewriting_prompt",
        "query_rewriting_hypothetical_queries_vs_id": "hypothetical_queries_vs", # Optional: vector store for similar query examples
        "query_rewriting_num_hypothetical_queries": 3 # Optional: number of similar queries to retrieve
    },

In the required retrieval_args field, you can specify how the agent will retrieve the relevant passages. The RAG 2.0 hybrid search combines semantic, keyword, and metadata retrieval methods for improved recall and precision. After retrieving passages from each method, the system automatically de-duplicates results. These are the fields to modify:

retriever_id: The ID of the retriever index used to fetch relevant passages.
retriever_num_semantic_passages: Number of passages to retrieve based on semantic similarity (default: 10). Adjust based on expected relevance and volume of results.
retriever_num_metadata_passages: Number of passages to retrieve based on metadata filtering (default: 5). Increase if metadata is highly informative for your use case.
retriever_num_keyword_passages: Number of passages to retrieve based on exact keyword matches (default: 5). Adjust based on the importance of exact keyword matching.
retriever_num_fewshot_examples: Number of few-shot examples for filter extraction (default: 2). Helps LLM extract metadata filters more accurately.
retriever_metadata_fields_to_extract: List of specific metadata fields to extract from queries (default: []). Examples: ["company", "document_type", "industry"].
retriever_filter_extraction_prompt_id: The ID of the prompt used for extracting filters during retrieval.
retriever_filter_extraction_llm_client_config_name: The LLM used for extracting filters during retrieval.
retriever_filter_extraction_llm_options: Additional options for the LLM used for filter extraction. Use a safe default if unsure.
retriever_general_filter: General filter applied to all retrieval operations (default: "'1' == '1'").

Python

    "retrieval_args": {
        "retriever_id": "my_pg_index", # a postgres vector db is needed to retrieve documents
        "retriever_num_semantic_passages": 10, # number of passages to return via semantic search
        "retriever_num_metadata_passages": 5, # number of passages to return based on metadata filtering
        "retriever_num_keyword_passages": 5, # number of passages to return based on exact keyword matching
        "retriever_num_fewshot_examples": 2, # number of few-shot examples for filter extraction
        "retriever_metadata_fields_to_extract": ["company", "topic", "document_type"], # specific metadata fields to extract
        "retriever_filter_extraction_prompt_id": "default_filter_extraction_prompt", # prompt to extract metadata from queries
        "retriever_filter_extraction_llm_client_config_name": "default-completions", # LLM for filter extraction
        "retriever_filter_extraction_llm_options": {}, # additional LLM options
        "retriever_general_filter": "'1' == '1'" # general filter applied to all retrieval operations
    },

In the optional reranker_args field, you can specify how the agent will rank the retrieved passages. In the following example, the agent will keep four passages to rank in its answer. It will use ms-marco-MiniLM-L6-v2 as the default crossencoder, combining the query and the passages for better context in the response. Optionally, you can use a lambda to do re-ranking on the fly:

reranker_num_passages_to_keep: The number of unique passages to keep after ranking. This helps reduce context overload. Adjust based on the desired level of detail in the final response.
reranker_type: The type of reranker to use. In this case, a cross-encoder is used to combine query and passage contexts. This can be 'llm' or 'crossencoder.'
reranker_params: Parameters for the reranker. Use a safe default if not specified. "crossEncoderName", "topK", "queryLength" are examples of keys you can use for the ms-marco-MiniLM-L6-v2 encoder, but you can use any additional args if you want to use a different crossEncoder.
reranker_llm_client_config_name: The LLM used for reranking the passages.
reranker_llm_options: Additional options for the LLM
reranker_lambda_str: An optional lambda function () to rerank in a custom way. You can send in additional context by sending in additional passages.
reranker_lambda_args: Arguments for the optional lambda function.

Python

    "reranker_args": {
        "reranker_num_passages_to_keep": 15, # retrieves unique passages only. hyperparameter to reduce context. for example, this would retrieve 15 out of 30 passages.
        "reranker_type": "crossencoder", #or 'llm' if needed. If you use crossencoder, there will be no llm reranking. The recommended value is crossencoder which uses sentence transformers.
        #"reranker_params": {}, # safe default
        "reranker_llm_client_config_name": "default-completions", #or other configured llm client
        #"reranker_llm_options": {}, # safe default
        #"reranker_lambda_str": "",  # Optional, safe default
        #"reranker_lambda_args": {} # safe default
    },

In the required message_builder_args field, you can specify how to build the messages we are going to send to the llm. It also adds a question answering prompt to the messages and it can also incorporate few shot examples:

message_builder_prompt_id: The ID of the prompt used for building the messages sent to the LLM. Customize this if you have a specific prompt template.
message_builder_use_raw_images: Whether to use raw images in the message response. Keep as True unless otherwise not needed.
message_builder_treat_tables_as_images: Whether to send PDF tables as raw images. Keep as True unless otherwise not needed.
message_builder_max_images: Maximum number of images to include in the message when processing passages with images. This applies to both raw images (when message_builder_use_raw_images=True) and table images (when message_builder_treat_tables_as_images=True). Must be a positive integer; if omitted, it defaults to 50. Providing a non-integer or non-positive value will cause a configuration error during initialization.
message_builder_max_image_payload_bytes: Maximum total byte size of base64-encoded image payload included across all image passages. Applies in addition to message_builder_max_images. Defaults to 20 MB (20 × 1024 × 1024 bytes). If set to None, no total payload cap is applied.
message_builder_max_per_image_bytes: Maximum byte size of a single base64-encoded image payload. Images exceeding this size are skipped. Defaults to 5 MB (5 × 1024 × 1024 bytes). If set to None, no per-image size cap is applied.
message_builder_lambda_str: Optional: A lambda function to customize message building. Leave empty if not needed.
message_builder_lambda_args: Arguments for the message building lambda function. Use an empty dictionary if not applicable.
message_builder_num_fewshot_examples: Number of few-shot examples to consider when building the messages. Adjust based on the desired context for the LLM and increase them for additional context.
message_builder_use_full_documents: Whether to include full document content for retrieved passages instead of just the relevant excerpts. Set to True for comprehensive context, False for focused responses. Note that setting this to True results in added latency.

Python

    "message_builder_args": {
        "message_builder_prompt_id": "default_message_builder_prompt", # this is a system prompt
        "message_builder_use_raw_images": True, # keep this as True to use images in the message response. You need mew3 for this to work.
        "message_builder_treat_tables_as_images": True, # keep this as True to treat the pdf tables as images
        "message_builder_max_images": 50, # maximum number of images to include in messages (must be positive integer)
        #"message_builder_lambda_str": "",  # Empty string instead of None for safe execution
        #"message_builder_lambda_args": {}, # Empty dict is safe
        "message_builder_num_fewshot_examples": 2, # number of few shots to consider
        "message_builder_use_full_documents": False # whether to include full document content for retrieved passages
    },

In the required query_answering_args field, you can specify which LLM the agent will use to answer the query, as well as the number of max input and output tokens to use:

query_answering_llm_client_config_name: The LLM used for generating the final answer. Choose the appropriate LLM based on your requirements.
query_answering_llm_options: Additional options for the LLM used for answering the query. Use a safe default if not specified.
query_answering_max_input_tokens: Maximum number of input tokens allowed. Set to a value that fits your use case, balancing context and performance. If the message payload exceeds this value it will be truncated. The default value is None, setting no limit to the max amount of tokens.
query_answering_max_output_tokens: Maximum number of output tokens allowed. Set to a value that ensures comprehensive but manageable responses. The default value is None, setting no limit to the max amount of tokens. Important: If set too low, this can cause incomplete JSON responses, which will trigger a fallback mode that disables citation validation (see Citation Validation section below).
query_answering_lambda_str: Optional: A lambda function to customize the query answering process. Leave empty if not needed.
query_answering_lambda_args: Arguments for the query answering lambda function. Use an empty dictionary if not applicable.

Citation Validation

The RAG unified tool automatically prevents citation hallucination by constraining the LLM to only reference passages that were actually retrieved. Here's how it works:

Automatic Enum Constraint: The system dynamically creates a JSON schema that restricts the citations field to only contain valid passage IDs from the retrieved documents.
Structured Output: The LLM receives a response format that enforces proper citation usage through enum validation.
Fallback Mode: If JSON parsing fails (often due to insufficient query_answering_max_output_tokens), the system falls back to unstructured output without citation constraints.

Best Practice: Set query_answering_max_output_tokens high enough (recommended: 2048+ tokens) to ensure complete JSON responses and maintain citation validation. If you experience citation validation failures, increase this limit rather than disabling the feature.

Python

    "query_answering_args": {
        "query_answering_llm_client_config_name": "default-completions", # or any llm of your choice
        #"query_answering_llm_options": {}, # safe default
        "query_answering_max_input_tokens": 8192,  # Safe default
        "query_answering_max_output_tokens": 2048,  # Safe default (if both input and output tokens are set to None, there is no truncation of input/output.)
        #"query_answering_lambda_str": "",  # Optional, safe default
        #"query_answering_lambda_args": {} # safe default
    }

Configure the dynamic agent with the RAG unified tool

To add the RAG tool to the dynamic agent, you need to specify the agent and make the necessary changes to the system and solution prompts.

Select py-c3agents as the kernel for the Jupyter runtime.

Check your tools in your dynamic agent by running:

Python

c3.Genai.Agent.Dynamic.Tool.fetch()

c3.Genai.Agent.Dynamic.Tool.forId('rag_unified')

Run the following code to update the tool's configuration params.

Python

tool = c3.Genai.Agent.Dynamic.Tool.forName("rag_unified")

new_init_kwargs = {
        "query_rewriting_args": {
        ...
        },
        "retrieval_args": {
        ...
        },

        "reranker_args": {
        ...
        },
        "message_builder_args": {
        ...
        },

        "query_answering_args": {
        ...
        }
    }

tool.withInitKwargs(new_init_kwargs).merge(mergeInclude="initKwargs")

Define your toolkit with the RAG unified tool.

Python

toolkit = c3.Genai.Agent.Dynamic.Toolkit(
    id="rag_unified_toolkit",
    name="rag_unified_toolkit",
    descriptionForUser="Toolkit for RAG Unified on Unstructured Data",
    tools={
        "rag_unified" : tool
    },
).create()

You can see that the agent is now rewriting queries, reranking passages, and building custom messages.

Copy link to this sectionThe RAG unified tool

Copy link to this sectionAn example of the RAG tool configuration

Copy link to this sectionCitation Validation

Copy link to this sectionConfigure the dynamic agent with the RAG unified tool

Copy link to this sectionSee also

The RAG unified tool

An example of the RAG tool configuration

Citation Validation

Configure the dynamic agent with the RAG unified tool

See also