Working with Data in Visual Notebooks
The majority of nodes in Visual Notebooks give you tools to analyze, clean, wrangle, and engineer data.
Analyzing data
Some of the best basic analysis tools are built directly into the dataframe. From any node, you can view analytics in the column headers, and create visualizations. For more information about built-in dataframe analytics, see the working with nodes section.
General analysis
If you want to dig deeper into data analytics, Visual Notebooks offers the following nodes:
- Describe Columns provides basic analytic information across all columns in your dataframe.
- Group By and Aggregate aggregates data so you can see summary information across different groupings.
- Top N By Group aggregates certain segments of your data based on user-defined sorting and grouping.
- Pivot creates pivot tables that aggregate, filter, and reorganize your data.
- Unpivot reorganizes your data to gain new insights.
Statistical analysis
Statistical analysis is an important tool in many data science projects. Visual Notebooks offers nodes for single-sample and multiple-sample hypothesis tests, including:
- 1-Sample Mean
- 1-Sample Variance/Standard Deviation
- 1-Sample Proportion
- 2-Sample Mean
- 2-Sample Paired Mean
- 1-Sample Variance
- 1-Sample Proportion
Cleaning data
If your data has significantly inaccurate or inconsistent data, you may need to use nodes to improve the quality of your dataset. The following nodes are typically used for data cleaning:
- Drop Duplicates removes duplicate data in your dataset.
- Numeric Missing Data imputes or removes numeric missing data.
- Timeseries - Resample and Interpolate aligns irregularly spaced timeseries data.
Wrangling data
Data is not always in the right format for analysis and machine learning. Visual Notebooks offers a multitude of nodes to prepare your data. Find nodes for wrangling data in all of the following categories:
- Column Operators contains nodes to do simple transformations on your data such as renaming, reordering, converting, joining, and sorting.
- Filter and Split contains nodes that let you filter your dataset or split your dataset into two parts.
- Math Operators contains the Arithmetic node, which allows you to make mathematical calculations on your data.
- Arrays and Objects contains nodes that allow you to work with complex data formats such as arrays and objects.
- Expression contains nodes that enable you to write custom code.
- Categorical Feature Prep contains nodes to convert categorical columns into numeric values that more easily be used as features for machine learning models.
- Continuous Feature Prep contains nodes to scale numeric data and separate data into bins.
- Timeseries Preparation contains nodes to split and combine timeseries data.
- Transformation and Feature Engineering contains nodes that allow you to manipulate timeseries data and create new features.
- Text Wrangling contains the Text Split node, which allows you to split text into an array of strings.