C3 AI Documentation Home

Page Rank

Use Visual Notebooks to calculate the relative importance of a vertex in a network, using a page rank algorithm.

Configuration

FieldDescription
Name (Optional) default=noneA user-specified node name displayed in the workspace
Page Rank Score default=page_rank_scoreName of the page rank score column Accept the default name for the page rank score column, or optionally enter a custom name.
Page Rank default=page_rankName of the page rank column Accept the default name for the page rank column, or optionally enter a custom name.
Reset Probability default=defaultProbability of stopping at a given vertex This is the probability that the process of moving from vertex to vertex stops at any particular point.
Limit maximum interations (default) default=20Number of iterations to perform Run the algorithm through a user-defined number of iterations.
Iterate until convergence within set tolerance default=noneRun until tolerance is reached Run the algorithm until the difference between successive runs is less than or equal to a user-specified tolerance.
Maximum iterations default=20User-supplied number of iterations Specify a fixed number of iterations for the PageRank algorithm to run.

Node Inputs/Outputs

InputA Visual Notebooks Assemble Graph node
OutputPage-ranked vertices and edges, and corresponding page rank scores

Example output

Figure 1: Example output

Examples

The term PageRank refers to an algorithm used to rank the importance of different webpages by assigning each webpage a score.

Two factors raise the PageRank score of a webpage:

  1. Many webpages link to the webpage.
  2. One or more webpages that link to that webpage have high scores themselves.

Although the PageRank algorithm was developed for webpages, it can be applied to any network containing vertices and edges. A vertex is a place in a network, while an edge is a path between two vertices. If a suburban neighborhood were modeled as a network, the vertices could be houses, and the segments of street that connect the houses could be edges.

In the examples that follow, we present a simplified, fictitious ecosystem of animals where each animal in the ecosystem consumes one or more other animals in the ecosystem. Then we analyze this data with a Page Rank node to determine which animals are most important to the ecosystem. The data is split into two files:

  1. A list of animals in the ecosystem, each of which functions as a vertex in the network (Figure 2).
  2. A set of pairs indicating which animals consume which other animals in the ecosystem, where each pair functions as an edge in the network (Figure 3).

Animal input data

Figure 2 Animal input data

Food input data

Figure 3: Food input data

For the first example, we determine the most important foods in the ecosystem by performing the following steps:

  1. Drag two CSV nodes onto the canvas. These are used to import the data.
  2. Drag an Assemble Graph node onto the canvas. This node transforms the data into vertices and edges.
  3. Drag a Graph node onto the canvas. The graph node enables us to visualize the vertices and edges of the ecosystem.
  4. Drag a Page Rank node onto the canvas.
  5. Download the animals data file, connect it to one of the two CSV nodes, and name the node Animals.
  6. Download the food data file, connect it to one of the two CSV nodes, and name the node Food.
  7. Connect the Animals CSV node to the Dataset Vertices input on the Assemble Graph node.
  8. Connect the Food CSV node to the Dataset Edges input on the Assemble Graph node.
  9. Select the Assemble Graph Node.
  10. For Select Column with Vertex ids, select AnimalList. This specifies the vertices.
  11. Also within that node, for Select Source Columns, select Animal, and for Select Destination Column, select Food. This specifies the edges.
  12. Select Run to generate the vertices and edges within the Assemble Graph node.
  13. Connect the Page Rank node to the Assemble Graph node.
  14. Connect the Graph node to the Assemble Graph node.

While the Graph node is not required to generate results, it allows us to visualize the ecosystem. At this point, your graph should contain five nodes, as in the figure that follows.

The Visual Notebooks canvas with nodes needed for examples

Figure 4: The Visual Notebooks canvas with nodes needed for examples

To start, we visualize our ecosystem as follows:

  1. Select the Graph node.
  2. Enable Show edge directionality. This enables you to see which animals consumes which other animal, since the arrow points from the eating animal to the animal being consumed.
  3. Select Run.

The Graph node allows us to see how each vertex is networked to the other vertices. In our example, this means that we can see which animals consume which other animals. Our graph shows that eight animals consume tuna, while nothing consumes coyote. (An arrow pointing from one animal A to animal B indicates that A consumes B.)

The ecosystem graphed

Figure 5: The ecosystem graphed

Now determine the most important food in the ecosystem by running the Page Rank node with the default settings, as follows:

  1. Select the Page Rank node.
  2. Select Run.

The results indicate that the top three most important foods in the ecosystem, along with their PageRank scores, are as follows:

  1. Tuna, 2.01
  2. Octopus, 1.97
  3. Crab, 1.85

Ranking of most important foods in the ecosystem

Figure 6: Ranking of most important foods in the ecosystem

The PageRank algorithm uses a reset probability, which is the likelihood that someone navigating from page to page looks at a particular webpage without clicking any links. In our ecosystem, the reset probability represents the probability that an animal at a given time does not consume another animal ever again. For our next example, we change the reset probability from its default level of .15 to a user-specified level of .85, by doing the following:

  1. For Reset Probability, enter .85.
  2. Select Run.

The results change so that the top three most important foods in the ecosystem, along with their PageRank scores, are as follows:

  1. Tuna, 1.28
  2. Crab, 1.11
  3. Octopus, 1.10

Ranking with reset probability of .85

Figure 7: Ranking with reset probability of .85

For the final example, configure the Page Rank node to run 50 iterations instead of the default value of 20, by performing these steps:

  1. Change Maximum iterations from 20 to 50.
  2. Select Run.

The results indicate no change in rank or score, which shows that 20 iterations were enough for the algorithm to generate accurate results.

Page rank algorithm run through 50 iterations

Figure 8: Page rank algorithm run through 50 iterations

Was this page helpful?