C3 AI Documentation Home

Delta Lake Input Node

Load data from Delta Lake into Visual Notebooks.

Prerequisites

Follow the steps below to add credentials for Delta Lake. You must have a valid access key or service principal.

  1. Drag a Delta Lake Input node onto the Visual Notebooks workspace
  2. Select the gear icon beside the Credential field Delta Lake credential settings
  3. Select the plus sign in the upper right corner Add new Delta Lake credential
  4. Enter a name for the credential
  5. Select Azure, then select ADLS Gen 2
  6. Select Access Key or Service Principal, then follow the instructions in the corresponding section below

Access Key

  1. Search for the ADLS storage account with Delta Lake tables in Azure Portal
  2. Select Access keys in the left-hand navigation menu
  3. Copy the storage account name to the Visual Notebooks credentials modal ADLS storage account name
  4. Select Show keys
  5. Copy the contents of one of the Key fields to the Visual Notebooks credentials modal ADLS access key
  6. Save the credentials in Visual Notebooks

Service Principal

  1. Search for App registrations in Azure Portal
  2. Select New registration New app registration
  3. Register the new application
  4. Copy the Application (client) ID field to the Visual Notebooks credentials modal
  5. Copy the Directory (tenant) ID field to the Visual Notebooks credentials modal Application and tenant IDs
  6. Select Certificates & secrets in the left-hand navigation menu
  7. Select New client secret New client secret
  8. Copy the client secret value to the Visual Notebooks credentials modal Client secret value
  9. Search for the ADLS storage account with Delta Lake tables in Azure Portal
  10. Copy the storage account name to the Visual Notebooks credentials modal Storage account name
  11. Select Access Control (IAM) in the left-hand navigation menu
  12. Select Add Role Assignment to open the "Add role assignment" page Add role assignment
  13. Choose the Storage Blob Data Contributor role assignment and select Next
  14. Click Select members
  15. Search for the name of the application you created and select it Select application
  16. Click Select in the bottom right corner
  17. Select Review + assign and wait for the role assignment to complete
  18. Save the credentials in Visual Notebooks

Configuration

FieldDescription
Name Default: SnowflakeA user-specified node name displayed in the workspace
Credential RequiredThe information needed to access Delta Lake data Select a saved credential from the dropdown menu. Select the gear icon to add a new credential or delete existing credentials.
Path RequiredThe Delta Lake table to upload Select a container and delta table using the popup menu. Note that Delta tables contain a folder called _delta_log.
Always load most recent table version when visual notebook is run Default: OffTable version Toggle this switch on to always load the most recent version of the selected table. Leave this toggle switch off to always load the same version of the selected table, regardless of whether there's a newer version available.
Specify how to select past tables Default: Select by versionPast table selection If "Always load most recent table version when visual notebook is run" is toggled off, specify how to select the desired version of the table. Select the desired version of the table by version number or by date.
Select Version OptionalPrevious table version If "Select by version" is selected in the "Specify how to select past tables" field above, select a table version from the auto-populated dropdown menu.
Select Date OptionalPrevious table date If "Select by date" is selected in the "Specify how to select past tables" field above, select a date and time from the calendar popup menu. The version of the table that was most recent on that particular date is used.
Query OptionalThe portion of the table to upload Enter a SQL query that returns the desired data. If a query is not provided, the entire table is loaded into Visual Notebooks.
Cache output Default: OnTable caching Toggle this switch off to load the selected table onto disk. The table will load quickly, but computations will take longer. Leave this toggle switch off to cache the selected table in memory. Running computations on the data will be fast, but the table will take longer to load.

Node Inputs/Outputs

InputNone
OutputVisual Notebooks returns a table, called a dataframe, that contains all uploaded data. Columns are labeled and include a symbol that specifies the data type of that column.

Example Delta Lake dataframe output

Figure 1: Example dataframe output

Examples

  • Select the "Choose Path" button. Select a container and Delta Lake table. Notice that Delta Lake tables contain a subfolder called _delta_log. Select a Delta Lake table to highlight both the parent folder and the _delta_log subfolder. Do not select only the _delta_log folder.

Delta Lake table selection

Figure 2: Selecting a Delta Lake table using the pop-up menu

  • Select a version of the table to upload for the "Select Version" field.
  • Select "Run" to create a dataframe.

Delta Lake table dataframe

Figure 3: Example dataframe created from a Delta Lake table

  • Write a query that returns the desired data. Use table to refer to the table selected in the "Path" field
    • In the example below, the query returns the "firstName", "middleName", and "lastName" columns for the first 100 rows of the selected table.
  • Select "Run" to create a dataframe with only the selected data.

Delta Lake SQL query results

Figure 4: Example dataframe created from a SQL query

Was this page helpful?