Python Estimator
Configure and train a machine learning model using the scikit-learn Python library in Visual Notebooks. This node is an estimator node, meaning it outputs a fitted model that can be used in an ML Pipeline to generate predictions on test data. The Python Estimator node is useful if you need access to the breadth of models offered by scikit-learn or wider control of model parameters.
Click here to learn more about scikit-learn estimators.
Configuration
| Field | Description |
|---|---|
| Name default=none | Field to name the node A user-specified node name, displayed in the canvas and in the dataframe as a tab. |
| Columns Required | Feature columns Select feature columns to be used in model training. Column names are stored in a list and can be accessed through the feature_columns variable in the notebook. |
| Training Label Required | Label column Select a column of labels to be used in model training. The column name can be accessed through the label_column variable in the notebook. |
Function Definitions
| Function | Description |
|---|---|
| result_schema(input_schema) | Schema of columns added to input dataframe Specify the schema, including name and data type, of any column(s) added to the input dataframe. Typically, this is done for a column of predictions. Columns should be appended to input_schema, which is a list containing the schema of the input dataframe. |
| train(df, feature_columns, label_column) | Fits a model to training data Configure and fit an estimator, or machine learning model, to the training data. The fit() method must be called to generate the fitted model, which is returned. |
| process(trained_model, df, feature_columns, prediction) | Adds predictions to input dataframe Generate predictions by applying the predict() method to the trained model. Predictions should be appended to the input dataframe, which is returned. |
Node Inputs/Outputs
| Input | A Visual Notebooks dataframe |
|---|---|
| Output | A dataframe, typically with a column of predictions included |

Figure 1: Example dataframe output
Examples
The data shown in Figure 2 is used in this example. It contains data on electricity consumption from DAEWOO Steel Co., Ltd, a steel producer in Gwangyang, South Korea1. We would like to train a model that can predict energy usage one timestep in advance.

Figure 2: Example input data
- Connect a Python Estimator node to an existing node. In this case, it is connected to a CSV node with the example data provided.
- Select all columns except for date (Timestamp) and lead_1_Usage_kWh_scaled (Double) in Columns.
- Select lead_1_Usage_kWh_scaled (Double) in Training Label.
- Copy the code below and paste it into the Notebook tab, as shown in Figure 3.
from sklearn.ensemble import GradientBoostingRegressor
def result_schema(input_schema):
input_schema.extend([['Predictions', 'float64']])
return input_schema
def train(df, feature_columns, label_column):
t_m = GradientBoostingRegressor(random_state=0)
model = t_m.fit(df[feature_columns], df[label_column])
return model
def process(trained_model, df, feature_columns, prediction):
prediction = trained_model.predict(df[feature_columns])
df['Predictions'] = prediction
return df- Click Run.

Figure 3: Notebook with filled in functions
Note that the output dataframe is identical to the one in Figure 1. The gradient boosting regressor is used to fit a model to the training data, and generate a column of predictions that can be used in subsequent analysis.
1Dua, D. and Graff, C. (2019).UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science