C3 AI Documentation Home

Search / Replace

The Search / Replace node in Visual Notebooks finds and replaces words and values. This node can help in cases where your dataset needs to be cleaned up for machine learning or further analysis.

Configuration

Configuration sidebar

FieldDescription
NameField to name the chart An optional user-specified node name displayed in the workspace, both on the node and in the dataframe as a tab.
Search ReplaceAdd From and To information Select From, To, and the new Value. Both numbers and words can be replaced.
AddAdd additional search and replace values Create additional search and replace values.
Replace Whole Word OnlySearch for whole words Toggle this selection off for any words or on for partial words.
Case SensitiveSearch for case sensitive words Toggle this selection off for any words or on for case-specific words.
Error Margin for Numeric ComparisonsEnter the margin of error When searching and replacing numeric values, enter a margin for numeric comparisons.

Node Inputs/Outputs

InputA Visual Notebooks dataframe
OutputA dataframe with words and/or values replaced.

Example dataframe output

Figure 1: Example dataframe output

Examples

To introduce you to the Search / Replace node, we are using a small dataset with source and destination shipping information. The following examples illustrate how search and replace works in Visual Notebooks.

Example input dataframe

Figure 2: Example input dataframe

  1. Connect an existing node to the Search / Replace node. In Figure 3, the Search / Replace node is connected to the CSV node with the search_replace.csv data for shipping routes.
  2. Select the Search / Replace node to configure it. Optionally, name the node String Corrections.
  3. Select columns in the Columns dropdown menu. In this case, the source (String), destination (String), source_currency (String), and destination_currency (String) are selected.
  4. Enter the Search Replace words. Use Custom to replace a word with a word entered in Value.
    • From Pounds, To Custom, Value Pound Sterling
    • From EPound, To Custom, Value Egyptian Pound
    • From Bombay, To Custom, Value Mumbai
    • From Shaghai, To Custom, Value Shanghai
  5. In this example, we kept Replace Whole Word Only and Case Sensitive on. For your dataset, it might make sense to search for partial words on a case by case basis. If some whole words are changed and some partial words need to be changed, you'll need a second Search / Replace node.
  6. Select Run

Since we are not making any numeric comparisons in Figure 3, we don't need the Error Margin for Numeric Comparisons.

Note: Search and replace works on one data type at a time. In this case, we are replacing words for string columns. Later, we will add a second Search / Replace node for integers.

To understand how to use the To field, it is helpful to know that both String columns and Integer columns have the option for Mean, Medium, Mode, and Custom.

  • String columns can be used with Mode and Custom only. Use caution when selecting Mode for strings to prevent unintended replacements.
  • Numeric columns can be used with Mean, Medium, Mode, and Custom.

Notice that the word changes have been made in the dataset in Figure 3.

Example dataframe with words replaced

Figure 3: Example dataframe with words replaced

Notice that Figure 2 included Mumbai and Bombay with different source and destination codes. Figure 3 corrects all instances of Bombay to Mumbai, which is the correct country name. However, the Bombay source and destination codes need to be updated to the Mumbai source and destination codes.

  1. Add a second Search / Replace node to your workspace. Optionally, name it Integer Corrections.
  2. Connect the String Corrections node to the Integer Corrections node. See Figure 4b to see what your workspace should look like with two nodes.
  3. Select source_code (Integer) and destination_code (Integer) for the Columns.
  4. Add From 42077, To Custom, Value 42078.
  5. Select Run.

Notice that all Mumbai source and destination codes are now 42078.

Caution:

Toggling on Replace Whole Word Only replaces the entire number that you've entered at every occurrence. Toggling the button off creates a partial replacement. Whole word or partial word replacements can both be used, but use caution with partial replacements.

For example, if you want to update 40277 to 40278:

  • Toggled on: Replacing 77 to 78 with Replace Whole Word Only toggled "on," affects only whole instances of 77 (77 in 40277 is not replaced).
  • Toggled off: Replacing 77 to 78 with Replace Whole Word Only toggled "off," changes 77 everywhere in a partial instance, even in unintended places (40277, 77042, 24775 would become 40278, 78042, 24785).

Example dataframe with numbers replaced

Figure 4a: Example dataframe with numbers replaced

Example workspace

Figure 4b: Example workspace

Was this page helpful?