Python Estimator

Configure and train a machine learning model using the scikit-learn Python library in Visual Notebooks. This node is an estimator node, meaning it outputs a fitted model that can be used in an ML Pipeline to generate predictions on test data. The Python Estimator node is useful if you need access to the breadth of models offered by scikit-learn or wider control of model parameters.

Click here to learn more about scikit-learn estimators.

Configuration

Field	Description
Name default=none	Field to name the node A user-specified node name, displayed in the canvas and in the dataframe as a tab.
Columns Required	Feature columns Select feature columns to be used in model training. Column names are stored in a list and can be accessed through the `feature_columns` variable in the notebook.
Training Label Required	Label column Select a column of labels to be used in model training. The column name can be accessed through the `label_column` variable in the notebook.

Function Definitions

Function	Description
result_schema(input_schema)	Schema of columns added to input dataframe Specify the schema, including name and data type, of any column(s) added to the input dataframe. Typically, this is done for a column of predictions. Columns should be appended to `input_schema`, which is a list containing the schema of the input dataframe.
train(df, feature_columns, label_column)	Fits a model to training data Configure and fit an estimator, or machine learning model, to the training data. The `fit()` method must be called to generate the fitted model, which is returned.
process(trained_model, df, feature_columns, prediction)	Adds predictions to input dataframe Generate predictions by applying the `predict()` method to the trained model. Predictions should be appended to the input dataframe, which is returned.

Node Inputs/Outputs

Input	A Visual Notebooks dataframe
Output	A dataframe, typically with a column of predictions included

Example dataframe output

Figure 1: Example dataframe output

Examples

The data shown in Figure 2 is used in this example. It contains data on electricity consumption from DAEWOO Steel Co., Ltd, a steel producer in Gwangyang, South Korea1. We would like to train a model that can predict energy usage one timestep in advance.

Example input data

Figure 2: Example input data

Connect a Python Estimator node to an existing node. In this case, it is connected to a CSV node with the example data provided.
Select all columns except for date (Timestamp) and lead_1_Usage_kWh_scaled (Double) in Columns.
Select lead_1_Usage_kWh_scaled (Double) in Training Label.
Copy the code below and paste it into the Notebook tab, as shown in Figure 3.

Text

from sklearn.ensemble import GradientBoostingRegressor

def result_schema(input_schema):
    input_schema.extend([['Predictions', 'float64']])
    return input_schema

def train(df, feature_columns, label_column):
    t_m = GradientBoostingRegressor(random_state=0)
    model = t_m.fit(df[feature_columns], df[label_column])
    return model

def process(trained_model, df, feature_columns, prediction):
    prediction = trained_model.predict(df[feature_columns])
    df['Predictions'] = prediction
    return df

Click Run.

Notebook with filled in functions

Figure 3: Notebook with filled in functions

Note that the output dataframe is identical to the one in Figure 1. The gradient boosting regressor is used to fit a model to the training data, and generate a column of predictions that can be used in subsequent analysis.

1Dua, D. and Graff, C. (2019).UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science

Copy link to this sectionConfiguration

Copy link to this sectionFunction Definitions

Copy link to this sectionNode Inputs/Outputs

Copy link to this sectionExamples

Configuration

Function Definitions

Node Inputs/Outputs

Examples