C3 AI Documentation Home

Configuring Structured Data Retrieval

Enabling structured data tools in C3 Generative AI lets you answer questions that rely on data from connected or integrated databases. Before users can query this data, an admin must configure which structured sources are available to the application. After setup, you can fine-tune several settings to improve retrieval accuracy and increase the chances of returning a useful answer.

Prerequisites

Before you begin, ensure the following:

  • You can access Jupyter Notebook and the Application C3 AI Console.

Open Jupyter Notebook

Open Jupyter Notebook from the C3 Generative AI Application card:

  1. Navigate to your application in C3 AI Studio.
  2. Select Jupyter Notebook.
  3. Set the py-query_orchestrator_312 runtime as the notebook kernel.

Fetching the data model

This tutorial assumes that you have uploaded structured data into your application.

The following example uses a database called SamplePublicFurnaceLocation, with a series of columns that specify the location of assets or furnaces. The data structure determines the data model.

idfunctionalLocationcountrycitylatitudelongitude
CA-SFCA-SFUSSF.25.505045.8484
FL-HoustonFL-HoustonUSHouston29.7601-95.0758
FL-HamburgFL-HamburgGermanyHamburg53.54889.9872

Setup structured retrieval

In a C3 Type, a whitelist is a defined set of allowed values for a field or method. When configuring the data model, be precise—if you omit a required Type, field, or metric, it won't be available to answer questions. If you include too many fields, the platform and language model may retrieve irrelevant information, reducing the quality of results. Structured retrieval can be accomplished using the following steps:

  1. Set up the white_list format as json to match your data model:

    Python
    white_list = {
        "SamplePublicFurnaceLocation": {
            "fields": [
                "id",
                "functionalLocation",
                "country",
                "city",
                "latitude",
                "longitude"
            ],
        }
    }
  2. As an optional step, use the overrideDocumentation field on your graph instance if you need to override the existing Type documentation.

    Python
    override_doc_on_data_model_graph = {
        "SamplePublicFurnaceLocation": {
            "doc": (
                "Furnaces work on a simple principle: instead of using electricity,"
                "furnaces use heat to make electricity. "
            ),
            "fields": {
                "id": "The id of the furnace",
                "functionalLocation": "The region that the furnace is made in",
                "country": "The country furnace is made in. Use this field when the query asks about where the furnace is made in. DO NOT use this field if the question asks about the city.",
                "city": "The city furnace is made in. Use this field when the query asks about where the furnace is made in.",
                "latitude": "Use this field if the question asks about the city furnace is located in.",
                "longitude": "Use this field if the question asks about the city furnace is located in.",
            },
        },
    }
  3. Name your data model graph (for example, furnaceDataModel) and set the Type of the Genai.StructuredData.DataModelGraph to that name. Refresh the graph with the whitelisted data model using the following code:

    Python
    data_model_graph_name = "furnaceDataModel"
    
    DATA_MODEL_GRAPH = c3.Genai.StructuredData.DataModelGraph.forName(data_model_graph_name)
    
    refreshGraph = True
    
    if not DATA_MODEL_GRAPH or refreshGraph:
    
        DATA_MODEL_GRAPH = (
            c3.Genai.StructuredData.DataModelGraph(
                id=data_model_graph_name,
                name=data_model_graph_name,
                whitelistedDataModel=white_list,
                includeCollectionFields=True,
                overrideDocumentation=override_doc_on_data_model_graph,
            )
            .upsert()
            .get()
        )
  4. Next, set up the dynamic agent use the refreshed data model. You must specify the execute_retrieval_spec_tool tool for retrieving structured data. The following example also upserts a fuzzy_match_string_values_tool that adds the data model graph and specifies then_gram_size to four. The n_gram_size field instructs the LLM to match four characters from the query to the requested data model's field.

    Python
    execute_retrieval_spec_tool = c3.Genai.Agent.Dynamic.Tool.make(
        {
            "id": "execute_retrieval_spec_new",
            "name": "execute_retrieval_spec_new",
            "pySrc": {"url": "meta://genAiBase/resource/tool/execute_retrieval_spec.py"},
            "toolConfigurationParams": {"dataModelName": data_model_graph_name},
        },
        "descriptionForLlm":"a tool to retrieve structured data"
    ).upsert(returnInclude="this")
    fuzzy_match_string_values_tool = c3.Genai.Agent.Dynamic.Tool.make(
        {
            "id": "fuzzy_match_string_values_new",
            "name": "fuzzy_match_string_values_new",
            "pySrc": {"url": "meta://genAiBase/resource/tool/fuzzy_match_string_values.py"},
            "toolConfigurationParams": {
                "fuzzyMatcherInitializationSpec": {
                    "dataModelGraph": data_model_graph_name,
                    "techniqueKwargs": {"n_gram_size": 4}
                },
                "nValuesToRetrieve": 20,
            },
            "descriptionForLlm":"a tool to fuzzy match queries"
        }
    ).upsert(returnInclude="this")
    fuzzy_match_string_values_tool.initialize()
  5. Add the new tools to your agent's toolkit and confirm it was done. You can start with checking for the new tools:

    Python
    c3.Genai.Agent.Dynamic.Tool.fetch()

    Now, add those tools to the agent's toolkit and confirm it was done.

    Python
    toolkit = c3.Genai.Agent.Dynamic.Toolkit(
        id="structured_retrieval_toolkit",
        name="structured_retrieval_toolkit",
        descriptionForUser="Toolkit for structured retrieval.",
         tools=[
             execute_retrieval_spec_tool,
             fuzzy_match_string_values_tool
         ],
    )
    toolkit = toolkit.upsert(returnInclude="this")
    
    c3.Genai.Agent.Dynamic.Toolkit.fetch(excludeMeta=True)
  6. The agent must now see the data model graph in its system prompt. You should add the documentation field which is part of the data model graph specified in Step 1.

    Python
    NEW_SYSTEM_PROMPT = """
    # C3 AI Agent
    
    You are an AI agent built by C3 AI, serving as the interface between users and their enterprise data and applications that can be accessed through tools.
    
    ## Tag Structure
    
    Wrap all responses in these tags:
    
    - `<thought> ... </thought>` - for internal reasoning, planning, or decision-making
    - `<execute> ... </execute>` - for executing Python code
    - `<solution> ... </solution>` - for delivering the final answer or asking the user for clarification
    
    ...
    
    ## Python Libraries Available
    
    You are allowed to use these libraries to address the user's query:
    
    - datetime
    - dateutil
    - time
    - numpy
    - pandas
    - matplotlib
    
    ## Toolkit
    
    {{toolkit}}
    
    ## Data Model Documentation
    
    {{documentation}}
    
    ## Additional Instructions
    
    {{instructions}}
    
    ## More Examples
    
    {{FEWSHOT_EXAMPLES}}
    """
    
    DATA_MODEL_GRAPH_DOCS = DATA_MODEL_GRAPH.buildDataModelDocumentation(
        nExampleStringValues=3
    )
    
    SYSTEM_PROMPT = c3.Genai.Prompt.fromString(NEW_SYSTEM_PROMPT)
    SYSTEM_PROMPT = SYSTEM_PROMPT.withField("id", "structured_retrieval_agent_demo")
    SYSTEM_PROMPT.remove()
    SYSTEM_PROMPT = SYSTEM_PROMPT.upsert(returnInclude="this")
  7. After updating your system prompt:

    • Initialize the agent and associate the toolkit with that agent
    • Update the chatManagerSpec with the updated SYSTEM_PROMPT
    • Finally, terminate all engines running PyUtil processes.
    Python
    # Create ChatManagerSpec with the Genai.Prompt object
    chat_manager_spec = c3.Genai.Agent.Dynamic.ChatManagerSpec(systemPrompt=SYSTEM_PROMPT)
    
    # Configure your agent
    # You can specify the name as "default_canvas" as well.
    
    agent = c3.Genai.Agent.Dynamic(name="deep_research")
    
    agent_config = agent.config()
    
    agent_config.setConfigValues({
    "chatManagerSpec": chat_manager_spec})
    
  8. Check the data model to see that it was changed:

    Python
    
     query_string = {"query": "What do you know about furnaces"}
     query_result = c3.Genai.ChatBot.createInitialGenAiResult(query_string)
     query_result = agent.run(query_result.searchQuery.standaloneQuery, query_result)

    Structured data

Was this page helpful?