Understanding Hierarchy Denormalization
Hierarchy Denormalization is a feature that provides the ability to flatten an acyclic graph data structure and store the flattened information in a table for later analysis.
This means taking a complex, branching structure (like a family tree or an organizational chart) and simplifying it into a flat list that can be stored and analyzed in a table. This process involves calculating the distances between nodes (or positions) and storing these distances for quick reference.
Hierarchy denormalization using an organizational structure
Imagine a company with an organizational chart that looks like this:
- CEO
- VP of Sales
- Sales Manager 1
- Sales Manager 2
- VP of Engineering
- Engineering Manager 1
- Engineer A
- Engineer B
- Engineering Manager 2
- Engineer C
- Engineer D
- Engineering Manager 1
- VP of Sales
In this chart, each person has a specific role and reports to someone else, creating a hierarchy.
Hierarchy denormalization can convert the branching structure into a simple table.
It involves:
- Computing the distances of each position to every other position it can reach.
- Storing these distances in a table format.
Here is how it might look:
| Position | Reports To | Distance |
|---|---|---|
| CEO | NULL | 0 |
| VP of Sales | CEO | 1 |
| Sales Manager 1 | VP of Sales | 1 |
| Sales Manager 2 | VP of Sales | 1 |
| VP of Engineering | CEO | 1 |
| Engineering Manager 1 | CEO | 2 |
| Engineer A | CEO | 3 |
| Engineer B | Engineering Manager 1 | 1 |
| Engineering Manager 2 | VP of Engineering | 1 |
| Engineer C | Engineering Manager 2 | 1 |
| Engineer D | CEO | 3 |
Benefits of denormalization
With this flattened table, you can quickly answer questions like:
- Who reports to the VP of Sales?
- Sales Manager 1 and Sales Manager 2
- What is the distance between the CEO and Engineer A?
- 3 steps (CEO -> VP of Engineering -> Engineering Manager 1 -> Engineer A)
- Who are the ancestors of Engineer B?
- Engineering Manager 1, VP of Engineering, CEO
Without denormalization, you would need to traverse the entire organizational chart every time you wanted to answer these questions, which can be time-consuming. By denormalizing the graph, you can query the flattened table, making it quick and efficient to retrieve this information.
Hierarchy denormalization API
You can use the Hierarchy Denormalization (denorm) API in the C3 Agentic AI Platform to flatten hierarchical data structures and store the results for easy querying. These APIs are available on Types that include the HierarchyDenorm or TimedIntervalHierarchyDenorm mixins using the syntax: mixes HierarchyDenorm or mixes TimedIntervalHierarchyDenorm.
Key concepts
Every graph consists of two main components:
- Vertex: An entity or node in the graph.
- Edge: A connection between two vertices.
To use the Hierarchy Denorm API, you need to:
- Identify the vertex and edge Types.
- Create a hierarchy denormalization Type using this information to specify how the graph should be denormalized, or flattened.
Types of Hierarchies
Hierarchy denormalization supports two types of hierarchies:
- Strict Hierarchy: Each vertex has exactly one parent, similar to a tree structure.
- Non-strict Hierarchy: A vertex can have multiple parents, resembling a general graph structure.
You can specify the type of hierarchy by setting the strictHierarchy field in the @denorm annotation when defining the hierarchy denormalization Type.
Processing modes
Hierarchy denorm uses the C3 Agentic AI Platform's distributed batch processing engine to distribute the jobs.
Hierarchy denorm can be run in synchronous mode or asynchronous mode.
Note: The default is the synchronous mode.
The synchronous mode waits for the underlying batch job to finish before returning the call to the HierarchyDenorm#denormalizeHierarchy function.
The asynchronous mode fires the batch jobs for processing and returns the call to the user with the pointer to those batch jobs for the user to track.
Denorm modes
You can denormalize hierarchies in one of the two modes:
- Full denorm
- Incremental denorm
Full hierarchy denorm
A full denormalization traverses the entire graph, flattening the whole graph, and writes results to the resultant Type. This can be a time consuming operation if the graph is large.
Denormalize only the Type
Perform a full denormalization synchronously by running the following:
HierarchyDenormType.denormalizeHierarchy()The code snippet above denormalizes the Type on which the function denormalizeHierarchy is called.
Denormalize all the hierarchies
Alternatively, hierarchies can also be fully denormalized during App#rebuild by setting the hierarchies field set to true. This denormalizes all the hierarchies available in the system. Refer to the RefreshDepsSpec for more information.
Important: App#rebuild requires a minimum of 2 compute threads to proceed.
Incremental denorm
If the graph is large and the state of graph often changes, then you do not want to compute the entire hierarchy every time the topology of the graph changes. The incremental denorm allows you to refresh only the portion of the graph that changes and update the hierarchy denorm table.
The incremental denorm mode is always turned on to recompute the incremental hierarchy, as new data is loaded into the system by creating entries in the InvalidationQueue. The hierarchy is automatically refreshed without the user intervention.
All incremental invalidations can be turned off by setting the global level environment config called AsyncProcessingDisabled to true.
Note: The above global level environment config is a global level invalidation config for all invalidations, and not just for hierarchy denorm. Setting this property disables the C3 Agentic AI Platform from doing any invalidations of data.
Environment config
There is only one environment config associated with hierarchy denorm. Use the environment config to set the batch size of processing or to disable the hierarchy denorm for specific Types.
Set the batch size
See the following:
key : "HierarchyDenormDisabled"
value format : "disableType=<comma_separated_list_of_hierarchy_denorm_types>;invalidationBatchSize=<int_value>"Replace the <variable> with the necessary values. For example, invalidationBatchSize=1000.
Turn off hierarchy denorm
Hierarchy denorm can be disabled for a single Type or multiple Types using the following HierarchyDenormConfig. You can use the HierarchyDenormConfig Type to tune different properties influencing the Hierarchy denormalization.
Example
Consider the following graph where every edge represents a distance of one.
A K
/ \ /
/ \ /
B C
/ \ / \
/ \ / \
D E F GA fully denormalized version of this graph would look like the following:
| From | To | Distance |
|---|---|---|
| A | A | 0 |
| A | B | 1 |
| A | C | 1 |
| A | D | 2 |
| A | E | 2 |
| A | F | 2 |
| A | G | 2 |
| B | B | 0 |
| B | D | 1 |
| B | E | 1 |
| C | C | 0 |
| C | F | 1 |
| C | G | 1 |
| D | D | 0 |
| E | E | 0 |
| F | F | 0 |
| G | G | 0 |
| K | K | 0 |
| K | C | 1 |
| K | F | 2 |
| K | G | 2 |
See also
- Hierarchy Denorm Tutorial
- Configure and Tune Batch Jobs
- HierarchyDenorm: Mixing in this Type exposes all the denormalization APIs.
- Ann.Denorm: The main annotation to be specified on the Type that mixes in the HierarchyDenorm Type.
- Relation: The Type that the Edge Type should mixin in order for the engine to identify edge relations between the vertices.