Standard Scaler

Standardize data in Visual Notebooks by scaling the standard deviation and centering around zero.

Configuration

Field	Description
Name	A user-specified node name displayed in the workspace
Select columns to scale	Numeric columns to scale Select columns from the auto-populated dropdown menu.
Scales data to unit standard deviation	Standard deviation scaling Leave this toggle switch on to divide each value by the standard deviation of that column.
Center data to zero mean	Center around zero Toggle this switch on to center the data around zero by subtracting the mean of the column from each value in the column. This is useful when trying to compare columns with different ranges of data. This switch is often used in conjunction with the "Scales data to unit standard deviation" toggle switch.
Keep Original Columns	Keep columns with unscaled data Toggle this switch on to keep the unscaled columns and create a new column with the scaled data. Keep the switch on to replace the unscaled columns with scaled data.
Output column suffix	Column suffix Enter a suffix to append to the scaled columns. The suffix can only contain alphanumeric characters and underscores.

Input	A Visual Notebooks dataframe
Output	A dataframe with scaled data

Example dataframe output

Figure 1: Example dataframe output

Connect a Standard Scaler node to an existing node.
- The dataframe below is used in this example.
- Notice that the "ant_length_inches" column has values between 0 and 0.5, while the "whaleshark_length_inches" column has values between 200 and 400.
- Imagine that you wanted to use this data about the length of ants and whale sharks to predict water quality. Since the values in the "whaleshark_length_inches" column are much larger than the values in the "ant_length_inches" column, the whale shark data overshadows the ant data when training a machine learning model. To ensure that our model considers both inputs equally, we need to adjust the data so both columns have the same scale and a similar range.

Example source data file

Figure 2: Example input data

Select both columns from the auto-populated dropdown menu in the "Select columns to scale" field.
Select "Run" to create a dataframe with the default settings.
- Notice that the original data is replaced with scaled data. Each entry has been divided by the standard deviation of that column, resulting in two columns with standard deviations of one. Both sets of data now use the same scale, but still have different ranges.

Example dataframe with default settings

Figure 3: Example dataframe with default settings

Toggle the "Center data to zero mean" and "Keep Original Columns" switches on.
Select "Run" to create a dataframe.
- Notice that the results of the scaling are presented in new columns with the "_scaled" suffix.
- All data is now centered around zero on the same scale, so it can be compared and used to train a model.

Example dataframe with scaled and zero-centered data

Figure 4: Example dataframe with scaled and zero-centered data