Structured Database Query assistant
Overview
StructuredDataQueryPy, also known as the structured database assistant, generates and executes eval specs and code for structured data queries. It offers advanced capabilities like table manipulations, mathematical calculations, and data wrangling on data fetched from multiple sources.
Enabling the feature
To use the StructuredDataQueryPy tool with Genai.Agent.QueryOrchestrator, include the tool Genai.Agent.StructuredDataQueryPy in your toolkit and include the tool in Genai.Agent.Config#uiSelectableTools so that the tool is visible in the tool selection dropdown in the UI.
To update the config from static console run the following snippet:
var queryOrchestrator = Genai.Agent.QueryOrchestrator.forConfigKey('QueryOrchestrator_default');
var queryOrchestrator.config().setConfigValues({
uiSelectableTools: ['StructuredDataQueryPy_default'], // Add this tool in addition to your other tools in the toolkit
});Configure the feature
Large language models
StructuredDataQueryPy is compatible with all large language models supported in GenAI. You can check the existing models using c3.Genai.UnstructuredQuery.Engine.ModelConfig.listConfigs().toData().
The recommended models for each provider are as follows:
- Claude: Claude Sonnet 3.5 is the top choice for accuracy and is the preferred large language model for use with
StructuredDataQueryPy. - OpenAI:
gpt-4oprovides the highest accuracy. - Gemini:
gemini-1.5-flash-experimentaldelivers the best accuracy.
Data model graph
The following points should be kept in mind while creating the data model graph:
- Include the ID field for all relevant types
- Make sure all relevant types, fields, and metrics are allow-listed
- Don't include unnecessary types, fields, since this introduces noise
- Create fields and stored calculations to expand functionality
For a structured database assistant, it is advised to set includeCollectionFields to false. This approach promotes the large language model to create multiple retrieval specifications for multi-hop questions, rather than relying on traversing the type system, which is limited to performing left joins.
data_model_graph = (
c3.Genai.StructuredData.DataModelGraph(
id="<id>",
name="<name>",
whitelistedDataModel=white_list,
includeCollectionFields=False,
overrideDocumentation=override_doc_on_data_model_graph # The fields have basic descriptions coming from the type declarations. We can either improve the descriptions in the type declarations or directly add documentation to fields and types on the fly
)
.upsert()
.get()
)For more information, see Setting the allow-listed data model.
To update the tool to use the new data model graph, large language model, or fuzzy matcher you must update the tool config as follows:
# Create the initialization spec to initialize
fuzzy_matcher_spec = c3.Genai.Agent.Tool.Util.StringFuzzyMatcher.InitializationSpec(
dataModelGraph=data_model_graph
name=my_name
).withDefaults()
# specialization is used as the filter for the few shot examples in the vector store associated with the assistant. It does not carry any specific semantic meanings, and you can set them to some arbitrary string.
specialization = "DB Retrieval"
spec = Genai.Agent.Tool.StructuredDataQueryPy.InitializationSpec(
llmConfigName="<llm>",
dataModelGraph=data_model_graph,
fuzzyMatcherSpec=fuzzy_matcher_spec,
specialization=specialization,
maxRetries=5,
nExampleStringValues=3,
).withDefaults()
db_query_config = c3.Genai.Agent.Tool.StructuredDataQueryPy(
id="StructuredDataQueryPy_default" # Update this with the tool name for `StructuredDataQueryPy`
).config()
db_query_config.setConfigValue("initializationSpec", spec)Adding few shot examples
To add few shot examples to the assistant memory, you can use the following two methods:
writeFewShotFromLastUserQuery: Uses a large language model to rewrite the last user conversation into a few shot example.- Genai.Agent.Tool.StructuredDataQueryPy#addFewShot: Create a few-shot example from query and code strings and add it to the vector store. Example usage:Python
structured_agent = c3.Genai.Agent.Tool.StructuredDataQueryPy( id="StructuredDataQueryPy_default" ).doInitialize(spec) structured_agent.addFewShot(query, code)
Limitations
The StructuredDataQueryPy assistant is not compatible with some UI features like:
- Editing the generated eval spec
- Adding few shot examples
- Syncing data model from self-service DI (unless the
defaultdata model is used for the structured db assistant tool as well)