Drop Duplicates

Drop duplicate rows in Visual Notebooks.

Configuration

Field	Description
Name default=none	A user-specified node name displayed in the canvas
Columns Required	Columns to search for duplicates: Select columns from the dropdown menu. These columns are searched for duplicate values. Records are removed only if there are duplicates in all selected columns.
Rows to keep default=`First`	Which duplicates to keep: Select First to drop all but the first row of duplicates. Select Last to drop all but the last row of duplicates. Select None to drop all rows of duplicates.
Case Sensitive default=`Off`	Case sensitivity: Toggle the switch off to ignore case when finding duplicates. Leave the switch on to find case-sensitive duplicates.

Node Inputs/Outputs

Input	A Visual Notebooks dataframe
Output	A dataframe without duplicate values in the selected columns

Example dataframe output

Figure 1: Example dataframe output

Examples

Connect a Drop Duplicates node to an existing node.

The data below is used in this example. Notice that rows four and five contain duplicates in the "Name" and "Age" column, but have unique values in the remaining columns. In contrast, rows eleven and twelve contain duplicates in every column.

Example data with duplicates

Figure 2: Example data with duplicates

Select the "Name" and "Age" columns from the dropdown menu.
Select Run to create a dataframe without duplicates in the selected columns.

Notice that the resulting dataframe no longer contains the second "Lola" and "Dalphine" rows.

Example dataframe without duplicates found in the "Name" and "Age" columns

Figure 3: Example dataframe without duplicates found in the "Name" and "Age" columns

Add the "Breed" column to the Columns field and select Run.

Notice that the resulting dataframe no longer contains the second "Dalphine" row, but does contain the second "Lola" row.

Example dataframe without duplicates found in the "Name", "Age", and "Breed" columns

Figure 4: Drop the all but the first duplicate found in the "Name", "Age", and "Breed" columns

Change the Rows to keep to None and select Run.

Notice that the resulting dataframe no longer contains either of the "Dalphine" rows.

Example dataframe without duplicates found in the "Name", "Age", and "Breed" columns

Figure 5: Drop all duplicates found in the "Name", "Age", and "Breed" columns

Copy link to this sectionConfiguration

Copy link to this sectionNode Inputs/Outputs

Copy link to this sectionExamples

Configuration

Node Inputs/Outputs

Examples