Standard Scaler
Standardize data in Visual Notebooks by scaling the standard deviation and centering around zero.
Configuration
| Field | Description |
|---|---|
| Name | A user-specified node name displayed in the workspace |
| Select columns to scale | Numeric columns to scale Select columns from the auto-populated dropdown menu. |
| Scales data to unit standard deviation | Standard deviation scaling Leave this toggle switch on to divide each value by the standard deviation of that column. |
| Center data to zero mean | Center around zero Toggle this switch on to center the data around zero by subtracting the mean of the column from each value in the column. This is useful when trying to compare columns with different ranges of data. This switch is often used in conjunction with the "Scales data to unit standard deviation" toggle switch. |
| Keep Original Columns | Keep columns with unscaled data Toggle this switch on to keep the unscaled columns and create a new column with the scaled data. Keep the switch on to replace the unscaled columns with scaled data. |
| Output column suffix | Column suffix Enter a suffix to append to the scaled columns. The suffix can only contain alphanumeric characters and underscores. |
Node Inputs/Outputs
| Input | A Visual Notebooks dataframe |
|---|---|
| Output | A dataframe with scaled data |

Figure 1: Example dataframe output
Examples
- Connect a Standard Scaler node to an existing node.
- The dataframe below is used in this example.
- Notice that the "ant_length_inches" column has values between 0 and 0.5, while the "whaleshark_length_inches" column has values between 200 and 400.
- Imagine that you wanted to use this data about the length of ants and whale sharks to predict water quality. Since the values in the "whaleshark_length_inches" column are much larger than the values in the "ant_length_inches" column, the whale shark data overshadows the ant data when training a machine learning model. To ensure that our model considers both inputs equally, we need to adjust the data so both columns have the same scale and a similar range.

Figure 2: Example input data
- Select both columns from the auto-populated dropdown menu in the "Select columns to scale" field.
- Select "Run" to create a dataframe with the default settings.
- Notice that the original data is replaced with scaled data. Each entry has been divided by the standard deviation of that column, resulting in two columns with standard deviations of one. Both sets of data now use the same scale, but still have different ranges.

Figure 3: Example dataframe with default settings
- Toggle the "Center data to zero mean" and "Keep Original Columns" switches on.
- Select "Run" to create a dataframe.
- Notice that the results of the scaling are presented in new columns with the "_scaled" suffix.
- All data is now centered around zero on the same scale, so it can be compared and used to train a model.

Figure 4: Example dataframe with scaled and zero-centered data