Structured Database Query assistant

Overview

StructuredDataQueryPy, also known as the structured database assistant, generates and executes eval specs and code for structured data queries. It offers advanced capabilities like table manipulations, mathematical calculations, and data wrangling on data fetched from multiple sources.

Enabling the feature

To use the StructuredDataQueryPy tool with Genai.Agent.QueryOrchestrator, include the tool Genai.Agent.StructuredDataQueryPy in your toolkit and include the tool in Genai.Agent.Config#uiSelectableTools so that the tool is visible in the tool selection dropdown in the UI.

To update the config from static console run the following snippet:

JavaScript

var queryOrchestrator = Genai.Agent.QueryOrchestrator.forConfigKey('QueryOrchestrator_default');
var queryOrchestrator.config().setConfigValues({
      uiSelectableTools: ['StructuredDataQueryPy_default'], // Add this tool in addition to your other tools in the toolkit
    });

Configure the feature

Large language models

StructuredDataQueryPy is compatible with all large language models supported in GenAI. You can check the existing models using c3.Genai.UnstructuredQuery.Engine.ModelConfig.listConfigs().toData().

The recommended models for each provider are as follows:

Claude: Claude Sonnet 3.5 is the top choice for accuracy and is the preferred large language model for use with StructuredDataQueryPy.
OpenAI: gpt-4o provides the highest accuracy.
Gemini: gemini-1.5-flash-experimental delivers the best accuracy.

Data model graph

The following points should be kept in mind while creating the data model graph:

Include the ID field for all relevant types
Make sure all relevant types, fields, and metrics are allow-listed
Don't include unnecessary types, fields, since this introduces noise
Create fields and stored calculations to expand functionality

For a structured database assistant, it is advised to set includeCollectionFields to false. This approach promotes the large language model to create multiple retrieval specifications for multi-hop questions, rather than relying on traversing the type system, which is limited to performing left joins.

Python

data_model_graph = (
    c3.Genai.StructuredData.DataModelGraph(
        id="<id>",
        name="<name>",
        whitelistedDataModel=white_list,
        includeCollectionFields=False,
        overrideDocumentation=override_doc_on_data_model_graph # The fields have basic descriptions coming from the type declarations. We can either improve the descriptions in the type declarations or directly add documentation to fields and types on the fly
    )
    .upsert()
    .get()
)

For more information, see Setting the allow-listed data model.

To update the tool to use the new data model graph, large language model, or fuzzy matcher you must update the tool config as follows:

Python

# Create the initialization spec to initialize
fuzzy_matcher_spec = c3.Genai.Agent.Tool.Util.StringFuzzyMatcher.InitializationSpec(
    dataModelGraph=data_model_graph
    name=my_name
).withDefaults()

# specialization is used as the filter for the few shot examples in the vector store associated with the assistant. It does not carry any specific semantic meanings, and you can set them to some arbitrary string.
specialization = "DB Retrieval"

spec = Genai.Agent.Tool.StructuredDataQueryPy.InitializationSpec(
    llmConfigName="<llm>",
    dataModelGraph=data_model_graph,
    fuzzyMatcherSpec=fuzzy_matcher_spec,
    specialization=specialization,
    maxRetries=5,
    nExampleStringValues=3,
).withDefaults()

db_query_config = c3.Genai.Agent.Tool.StructuredDataQueryPy(
    id="StructuredDataQueryPy_default" # Update this with the tool name for `StructuredDataQueryPy`
).config()
db_query_config.setConfigValue("initializationSpec", spec)

Adding few shot examples

To add few shot examples to the assistant memory, you can use the following two methods:

writeFewShotFromLastUserQuery: Uses a large language model to rewrite the last user conversation into a few shot example.
Genai.Agent.Tool.StructuredDataQueryPy#addFewShot: Create a few-shot example from query and code strings and add it to the vector store. Example usage:
Python
```
structured_agent = c3.Genai.Agent.Tool.StructuredDataQueryPy(
  id="StructuredDataQueryPy_default"
).doInitialize(spec)
structured_agent.addFewShot(query, code)
```

Limitations

The StructuredDataQueryPy assistant is not compatible with some UI features like:

Editing the generated eval spec
Adding few shot examples
Syncing data model from self-service DI (unless the default data model is used for the structured db assistant tool as well)

Copy link to this sectionOverview

Copy link to this sectionEnabling the feature

Copy link to this sectionConfigure the feature

Copy link to this sectionLarge language models

Copy link to this sectionData model graph

Copy link to this sectionAdding few shot examples