Configuring Structured Data Retrieval

Enabling structured data tools in C3 Generative AI lets you answer questions that rely on data from connected or integrated databases. Before users can query this data, an admin must configure which structured sources are available to the application. After setup, you can fine-tune several settings to improve retrieval accuracy and increase the chances of returning a useful answer.

This document covers configuration for CanvasAgent_default and CanvasAgent_deep_research. The project ids for these are default_canvas and deep_research and you can use these to initialize those agents. All other agents are instances of Dynamic Agent Canvas or Dynamic Agent Deep Research, so you can follow the same steps for any agent instance.

Prerequisites

Before you begin, ensure the following:

You can access Jupyter Notebook and the Application C3 AI Console.

Open Jupyter Notebook

Open Jupyter Notebook from the C3 Generative AI Application card:

Navigate to your application in C3 AI Studio.
Select Jupyter Notebook.
Set the py-query_orchestrator_312 runtime as the notebook kernel.

Fetching the data model

This tutorial assumes that you have uploaded structured data into your application.

The following example uses a database called SamplePublicFurnaceLocation, with a series of columns that specify the location of assets or furnaces. The data structure determines the data model.

id	functionalLocation	country	city	latitude	longitude
CA-SF	CA-SF	US	SF.	25.5050	45.8484
FL-Houston	FL-Houston	US	Houston	29.7601	-95.0758
FL-Hamburg	FL-Hamburg	Germany	Hamburg	53.5488	9.9872

Setup structured retrieval

In a C3 Type, a whitelist is a defined set of allowed values for a field or method. When configuring the data model, be precise—if you omit a required Type, field, or metric, it won't be available to answer questions. If you include too many fields, the platform and language model may retrieve irrelevant information, reducing the quality of results. Structured retrieval can be accomplished using the following steps:

Set up the white_list format as json to match your data model:

Python

white_list = {
    "SamplePublicFurnaceLocation": {
        "fields": [
            "id",
            "functionalLocation",
            "country",
            "city",
            "latitude",
            "longitude"
        ],
    }
}

As an optional step, use the overrideDocumentation field on your graph instance if you need to override the existing Type documentation.

You must persist (upsert or merge) the graph after updating this field for it to take effect. Create descriptions of the fields to improve retrieval performance of the overrideDocumentation field.

Python

override_doc_on_data_model_graph = {
    "SamplePublicFurnaceLocation": {
        "doc": (
            "Furnaces work on a simple principle: instead of using electricity,"
            "furnaces use heat to make electricity. "
        ),
        "fields": {
            "id": "The id of the furnace",
            "functionalLocation": "The region that the furnace is made in",
            "country": "The country furnace is made in. Use this field when the query asks about where the furnace is made in. DO NOT use this field if the question asks about the city.",
            "city": "The city furnace is made in. Use this field when the query asks about where the furnace is made in.",
            "latitude": "Use this field if the question asks about the city furnace is located in.",
            "longitude": "Use this field if the question asks about the city furnace is located in.",
        },
    },
}

Name your data model graph (for example, furnaceDataModel) and set the Type of the Genai.StructuredData.DataModelGraph to that name. Refresh the graph with the whitelisted data model using the following code:

Python

data_model_graph_name = "furnaceDataModel"

DATA_MODEL_GRAPH = c3.Genai.StructuredData.DataModelGraph.forName(data_model_graph_name)

refreshGraph = True

if not DATA_MODEL_GRAPH or refreshGraph:

    DATA_MODEL_GRAPH = (
        c3.Genai.StructuredData.DataModelGraph(
            id=data_model_graph_name,
            name=data_model_graph_name,
            whitelistedDataModel=white_list,
            includeCollectionFields=True,
            overrideDocumentation=override_doc_on_data_model_graph,
        )
        .upsert()
        .get()
    )

Next, set up the dynamic agent use the refreshed data model. You must specify the execute_retrieval_spec_tool tool for retrieving structured data. The following example also upserts a fuzzy_match_string_values_tool that adds the data model graph and specifies then_gram_size to four. The n_gram_size field instructs the LLM to match four characters from the query to the requested data model's field.

Python

execute_retrieval_spec_tool = c3.Genai.Agent.Dynamic.Tool.make(
    {
        "id": "execute_retrieval_spec_new",
        "name": "execute_retrieval_spec_new",
        "pySrc": {"url": "meta://genAiBase/resource/tool/execute_retrieval_spec.py"},
        "toolConfigurationParams": {"dataModelName": data_model_graph_name},
    },
    "descriptionForLlm":"a tool to retrieve structured data"
).upsert(returnInclude="this")
fuzzy_match_string_values_tool = c3.Genai.Agent.Dynamic.Tool.make(
    {
        "id": "fuzzy_match_string_values_new",
        "name": "fuzzy_match_string_values_new",
        "pySrc": {"url": "meta://genAiBase/resource/tool/fuzzy_match_string_values.py"},
        "toolConfigurationParams": {
            "fuzzyMatcherInitializationSpec": {
                "dataModelGraph": data_model_graph_name,
                "techniqueKwargs": {"n_gram_size": 4}
            },
            "nValuesToRetrieve": 20,
        },
        "descriptionForLlm":"a tool to fuzzy match queries"
    }
).upsert(returnInclude="this")
fuzzy_match_string_values_tool.initialize()

Add the new tools to your agent's toolkit and confirm it was done. You can start with checking for the new tools:

Python

c3.Genai.Agent.Dynamic.Tool.fetch()

Now, add those tools to the agent's toolkit and confirm it was done.

Python

toolkit = c3.Genai.Agent.Dynamic.Toolkit(
    id="structured_retrieval_toolkit",
    name="structured_retrieval_toolkit",
    descriptionForUser="Toolkit for structured retrieval.",
     tools=[
         execute_retrieval_spec_tool,
         fuzzy_match_string_values_tool
     ],
)
toolkit = toolkit.upsert(returnInclude="this")

c3.Genai.Agent.Dynamic.Toolkit.fetch(excludeMeta=True)

The agent must now see the data model graph in its system prompt. You should add the documentation field which is part of the data model graph specified in Step 1.

Python

NEW_SYSTEM_PROMPT = """
# C3 AI Agent

You are an AI agent built by C3 AI, serving as the interface between users and their enterprise data and applications that can be accessed through tools.

## Tag Structure

Wrap all responses in these tags:

- `<thought> ... </thought>` - for internal reasoning, planning, or decision-making
- `<execute> ... </execute>` - for executing Python code
- `<solution> ... </solution>` - for delivering the final answer or asking the user for clarification

...

## Python Libraries Available

You are allowed to use these libraries to address the user's query:

- datetime
- dateutil
- time
- numpy
- pandas
- matplotlib

## Toolkit

{{toolkit}}

## Data Model Documentation

{{documentation}}

## Additional Instructions

{{instructions}}

## More Examples

{{FEWSHOT_EXAMPLES}}
"""

DATA_MODEL_GRAPH_DOCS = DATA_MODEL_GRAPH.buildDataModelDocumentation(
    nExampleStringValues=3
)

SYSTEM_PROMPT = c3.Genai.Prompt.fromString(NEW_SYSTEM_PROMPT)
SYSTEM_PROMPT = SYSTEM_PROMPT.withField("id", "structured_retrieval_agent_demo")
SYSTEM_PROMPT.remove()
SYSTEM_PROMPT = SYSTEM_PROMPT.upsert(returnInclude="this")

After updating your system prompt:

Initialize the agent and associate the toolkit with that agent
Update the chatManagerSpec with the updated SYSTEM_PROMPT
Finally, terminate all engines running PyUtil processes.

Python

# Create ChatManagerSpec with the Genai.Prompt object
chat_manager_spec = c3.Genai.Agent.Dynamic.ChatManagerSpec(systemPrompt=SYSTEM_PROMPT)

# Configure your agent
# You can specify the name as "default_canvas" as well.

agent = c3.Genai.Agent.Dynamic(name="deep_research")

agent_config = agent.config()

agent_config.setConfigValues({
"chatManagerSpec": chat_manager_spec})

Check the data model to see that it was changed:

Python


 query_string = {"query": "What do you know about furnaces"}
 query_result = c3.Genai.ChatBot.createInitialGenAiResult(query_string)
 query_result = agent.run(query_result.searchQuery.standaloneQuery, query_result)

Structured data

Copy link to this sectionPrerequisites

Copy link to this sectionOpen Jupyter Notebook

Copy link to this sectionFetching the data model

Copy link to this sectionSetup structured retrieval

Prerequisites

Open Jupyter Notebook

Fetching the data model

Setup structured retrieval