S3 Node

Import data from CSV files stored in Amazon Web Services (AWS) S3 object store.

Prerequisites

You must add an access key and AWS secret key to use the S3 node. Follow the steps below to generate these keys. For more information, see the AWS documentation

Log into AWS
Select your account name or number in the top right corner
Select Security credentials
Select the Access keys (access key ID and secret access key) section
Select Create New Access Key
Select Download Key File
Drag an S3 node onto the Visual Notebooks workspace
Select the gear icon beside the Credential field
Select the plus sign in the upper right corner
Enter a name for the credential
Enter the access key and AWS secret key found within the downloaded file

Configuration

Field	Description
Name Default: S3	A user-specified node name displayed in the workspace
Credential Required	The information needed to access S3 data Select a saved credential from the dropdown menu. Select the gear icon to add a new credential or delete existing credentials.
Path Optional	The S3 file to upload Select the bucket and the desired file from the pop-up menu.
Nullify Entries with Incompatible Format Default: Off	Discard bad values Toggle this switch on to find values with data types (string, integer, decimal, Boolean, etc.) that are different from the data type found in the rest of the column. Entries with different data types are changed to null values.
Delimiter Default: Comma	The character that separates values Set the delimiter to comma, pipe, colon, semicolon, tab, or space. Only change this field if the uploaded file uses nonstandard formatting.
Quote Default: "	The character that surrounds values to ignore Set the quote to any character. Delimiters inside quotes are ignored. Only change this field if the uploaded file uses nonstandard formatting.
Number of rows to use in schema inference Default: 100	Rows used to determine a column's data type Set this value to any valid whole number. Visual Notebooks reads the number of rows specified, starting with the first row of the file. These rows are used to determine each column's data type.
Has Header Default: On	Header data to be used as column names Toggle the "Has Header" switch on if the uploaded file has an initial header row of column names. Toggle the switch off to use numerical column names ("_c0", "_c1", etc.) instead.
Leave all columns as StringType (no schema inference) Default: Off	String data type Toggle this switch on to read all columns as strings and upload all values.

Node Inputs/Outputs

Input	None
Output	Visual Notebooks returns a table, called a dataframe, that contains all uploaded data. Columns are labeled and include a symbol that specifies the data type of that column.

Example S3 node output

Figure 1: Example dataframe output

Examples

Select the file to upload using the "Path" field.
Select "Run" to create a dataframe with the default settings.
- Notice that columns are labeled and include a symbol that specifies the data type of that column. Various data types are present in the data and are accurately represented in the dataframe.

Example dataframe with default settings

Figure 2: Example dataframe with default settings

Toggle the "Leave all columns as StringType (no schema inference)" switch on to upload all data as strings.
- The "A" symbol beside each column indicates the string data type.

Example dataframe with string data type

Figure 3: Example dataframe with all data uploaded as strings

Copy link to this sectionPrerequisites

Copy link to this sectionConfiguration

Copy link to this sectionNode Inputs/Outputs

Copy link to this sectionExamples

Prerequisites

Configuration

Node Inputs/Outputs

Examples