C3 AI Documentation Home

Google Cloud Storage Node

Input data from Google Cloud Storage to Visual Notebooks. You must have an existing Cloud Storage account to use this node. For more information about Cloud Storage, see Google Cloud Storage.

Prerequisites

You must add a Google Cloud "service account JSON key" to Visual Notebooks before using the Google Cloud Storage node. Follow the steps below to generate this key and add it to Visual Notebooks. For more information about service account JSON keys, see the Google Cloud documentation.

  1. Log into Google Cloud Platform
  2. Select the hamburger menu in the top left corner, then select IAM & Admin > Service Accounts Google Cloud IAM menu
  3. Select Create Service Account near the top of the page Create service account button
  4. Enter a service account name. This will auto-generate a service account ID Service account details
  5. Select Create and Continue
  6. Add the following roles:
    1. Cloud Storage > Storage Object Admin
    2. Cloud Storage > Storage Object Viewer Cloud Storage roles
  7. Select Done
  8. Select the three vertical dots under Actions, then Manage keys Manage service account keys
  9. Select Add Key > Create New Key Create new key
  10. Check that JSON is selected, then select Create Select JSON key type
  11. Open the file that automatically downloads and copy the content Copy JSON key content
  12. Drag a Google Cloud Storage node onto the Visual Notebooks workspace
  13. Select the gear icon beside the Credential field GCS credential settings
  14. Select the plus sign in the upper right corner Add new GCS credential
  15. Enter a name for the credentials
  16. Paste the contents of the downloaded file into the Service Account JSON Key field Configure GCS credentials

Configuration

Initial configuration sidebar Initial sidebar

Sidebar after selecting a file All configuration fields

FieldDescription
Name OptionalName of the node An optional user-specified node name displayed in the workspace, both on the node and in the dataframe as a tab.
Credential RequiredThe information needed to access Cloud Storage data Select a saved credential from the dropdown menu. Select the gear icon to add a new credential or delete existing credentials.
Path RequiredThe file to upload Select the bucket and the desired file from the pop-up menu. The rest of the fields appear after a file is selected.
Nullify Entries with Incompatible Format Default: OffDiscard bad values Toggle this switch on to find values with data types (string, integer, decimal, Boolean, etc.) that are different from the data type found in the rest of the column. Entries with different data types are changed to null values.
Delimiter Default: CommaThe character that separates values Set the delimiter to comma, pipe, colon, semicolon, tab, or space. Only change this field if the uploaded file uses nonstandard formatting.
Quote Default: "The character that surrounds values to ignore Set the quote to any character. Delimiters inside quotes are ignored. Only change this field if the uploaded file uses nonstandard formatting.
Number of rows to use in schema inference Default: 100Rows used to determine a column's data type Set this value to any valid whole number. Visual Notebooks reads the number of rows specified, starting with the first row of the file. These rows are used to determine each column's data type.
Has Header Default: OnHeader data to be used as column names Toggle the "Has Header" switch on if the uploaded file has an initial header row of column names. Toggle the switch off to use numerical column names ("_c0", "_c1", etc.) instead.
Leave all columns as StringType (no schema inference) Default: OffString data type Toggle this switch on to read all columns as strings and upload all values.

Node Inputs/Outputs

InputNone
OutputVisual Notebooks returns a table, called a dataframe, that contains all uploaded data. Columns are labeled and include a symbol that specifies the data type of that column.

Example GCS node output

Figure 1: Example output

Examples

Follow the steps below to import data from Cloud Storage.

  1. Select the saved credentials used to access Cloud Storage. If you have not yet saved credentials, follow the steps in the Prerequisites section above.
  2. Select the file to upload using the Path field.
  3. Select Run to create a dataframe with the default settings.

Notice that columns are labeled and include a symbol that specifies the data type of that column. Various data types are present in the data and are accurately represented in the dataframe.

Example dataframe with default settings

Figure 2: Example dataframe with default settings

If you want to upload all data as strings, toggle the Leave all columns as StringType (no schema inference) switch on and select Run. Notice that the "A" symbol beside each column indicates the string data type.

Example dataframe with string data type

Figure 3: Example dataframe with all data uploaded as strings

Was this page helpful?