Skip to content

Latest commit

 

History

History

demo

DLT-META DEMO's

  1. DAIS 2023 DEMO: Showcases DLT-META's capabilities of creating Bronze and Silver DLT pipelines with initial and incremental mode automatically.
  2. Databricks Techsummit Demo: 100s of data sources ingestion in bronze and silver DLT pipelines automatically.
  3. Append FLOW Autoloader Demo: Write to same target from multiple sources using dlt.append_flow and adding File metadata column
  4. Append FLOW Eventhub Demo: Write to same target from multiple sources using dlt.append_flow and adding File metadata column
  5. Silver Fanout Demo: This demo showcases the implementation of fanout architecture in the silver layer.

DAIS 2023 DEMO

This Demo launches Bronze and Silver DLT pipelines with following activities:

  • Customer and Transactions feeds for initial load
  • Adds new feeds Product and Stores to existing Bronze and Silver DLT pipelines with metadata changes.
  • Runs Bronze and Silver DLT for incremental load for CDC events

Steps:

  1. Launch Terminal/Command prompt

  2. Install Databricks CLI

  3.  git clone https://github.com/databrickslabs/dlt-meta.git 
    
  4.  cd dlt-meta
    
  5. Set python environment variable into terminal

    dlt_meta_home=$(pwd)
    
    export PYTHONPATH=$dlt_meta_home
    
  6. Run the command python demo/launch_dais_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-demo-automated

    • cloud_provider_name : aws or azure or gcp

    • db_version : Databricks Runtime Version

    • dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines

    • you can provide --profile=databricks_profile name in case you already have databricks cli otherwise command prompt will ask host and token.

      • 6a. Databricks Workspace URL:
      • Enter your workspace URL, with the format https://.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.
      • 6b. Token:
        • In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.

        • On the Access tokens tab, click Generate new token.

        • (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).

        • Click Generate.

        • Copy the displayed token

        • Paste to command prompt

Databricks Tech Summit FY2024 DEMO:

This demo will launch auto generated tables(100s) inside single bronze and silver DLT pipeline using dlt-meta.

  1. Launch Terminal/Command promt

  2. Install Databricks CLI

  3.  git clone https://github.com/databrickslabs/dlt-meta.git 
    
  4.  cd dlt-meta
    
  5. Set python environment variable into terminal

    dlt_meta_home=$(pwd)
    
    export PYTHONPATH=$dlt_meta_home
    
  6. Run the command

    python demo/launch_techsummit_demo.py --source=cloudfiles --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/techsummit-dlt-meta-demo-automated 
    
    • cloud_provider_name : aws or azure

    • db_version : Databricks Runtime Version

    • dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines

    • you can provide --profile=databricks_profile name in case you already have databricks cli otherwise command prompt will ask host and token

      • 6a. Databricks Workspace URL:
        • Enter your workspace URL, with the format https://.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.
      • 6b. Token:
        • In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.

        • On the Access tokens tab, click Generate new token.

        • (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).

        • Click Generate.

        • Copy the displayed token

        • Paste to command prompt

Append Flow Autoloader file metadata demo:

This demo will perform following tasks:

  • Read from different source paths using autoloader and write to same target using append_flow API
  • Read from different delta tables and write to same silver table using append_flow API
  • Add file_name and file_path to target bronze table for autoloader source using File metadata column
  1. Launch Terminal/Command prompt

  2. Install Databricks CLI

  3.  git clone https://github.com/databrickslabs/dlt-meta.git 
    
  4.  cd dlt-meta
    
  5. Set python environment variable into terminal

    dlt_meta_home=$(pwd)
    
    export PYTHONPATH=$dlt_meta_home
    
  6.  python demo/launch_af_cloudfiles_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=ravi_dlt_meta_uc
    
  • cloud_provider_name : aws or azure or gcp
  • db_version : Databricks Runtime Version
  • dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
  • uc_catalog_name: Unity catalog name
  • you can provide --profile=databricks_profile name in case you already have databricks cli otherwise command prompt will ask host and token

af_am_demo.png

Append Flow Eventhub demo:

  • Read from different eventhub topics and write to same target tables using append_flow API

Steps:

  1. Launch Terminal/Command prompt

  2. Install Databricks CLI

  3.  git clone https://github.com/databrickslabs/dlt-meta.git 
    
  4.  cd dlt-meta
    
  5. Set python environment variable into terminal

    dlt_meta_home=$(pwd)
    
    export PYTHONPATH=$dlt_meta_home
    
  6. Eventhub

  • Needs eventhub instance running

  • Need two eventhub topics first for main feed (eventhub_name) and second for append flow feed (eventhub_name_append_flow)

  • Create databricks secrets scope for eventhub keys

    •       commandline databricks secrets create-scope eventhubs_dltmeta_creds
      
    •       databricks secrets put-secret --json '{
                "scope": "eventhubs_dltmeta_creds",
                "key": "RootManageSharedAccessKey",
                "string_value": "<<value>>"
                }' 
      
  • Create databricks secrets to store producer and consumer keys using the scope created in step 2

  • Following are the mandatory arguments for running EventHubs demo

    • cloud_provider_name: Cloud provider name e.g. aws or azure
    • dbr_version: Databricks Runtime Version e.g. 15.3.x-scala2.12
    • uc_catalog_name : unity catalog name e.g. ravi_dlt_meta_uc
    • dbfs_path: Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines e.g. dbfs:/tmp/DLT-META/demo/
    • eventhub_namespace: Eventhub namespace e.g. dltmeta
    • eventhub_name : Primary Eventhubname e.g. dltmeta_demo
    • eventhub_name_append_flow: Secondary eventhub name for appendflow feed e.g. dltmeta_demo_af
    • eventhub_producer_accesskey_name: Producer databricks access keyname e.g. RootManageSharedAccessKey
    • eventhub_consumer_accesskey_name: Consumer databricks access keyname e.g. RootManageSharedAccessKey
    • eventhub_secrets_scope_name: Databricks secret scope name e.g. eventhubs_dltmeta_creds
    • eventhub_port: Eventhub port
  1.  python3 demo/launch_af_eventhub_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=ravi_dlt_meta_uc --eventhub_name=dltmeta_demo --eventhub_name_append_flow=dltmeta_demo_af --eventhub_secrets_scope_name=dltmeta_eventhub_creds --eventhub_namespace=dltmeta --eventhub_port=9093 --eventhub_producer_accesskey_name=RootManageSharedAccessKey --eventhub_consumer_accesskey_name=RootManageSharedAccessKey --eventhub_accesskey_secret_name=RootManageSharedAccessKey --uc_catalog_name=ravi_dlt_meta_uc
    

af_eh_demo.png

Silver Fanout Demo

  • This demo will showcase the onboarding process for the silver fanout pattern.
    • Run the onboarding process for the bronze cars table, which contains data from various countries.
    • Run the onboarding process for the silver tables, which have a where_clause based on the country condition specified in silver_transformations_cars.json.
    • Run the Bronze DLT pipeline which will produce cars table.
    • Run Silver DLT pipeline, fanning out from the bronze cars table to country-specific tables such as cars_usa, cars_uk, cars_germany, and cars_japan.

Steps:

  1. Launch Terminal/Command prompt

  2. Install Databricks CLI

  3.  git clone https://github.com/databrickslabs/dlt-meta.git 
    
  4.  cd dlt-meta
    
  5. Set python environment variable into terminal

    dlt_meta_home=$(pwd)
    
    export PYTHONPATH=$dlt_meta_home
    
    
  6. Run the command python demo/launch_silver_fanout_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-silver-fanout

    • cloud_provider_name : aws or azure

    • db_version : Databricks Runtime Version

    • dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines

    • you can provide --profile=databricks_profile name in case you already have databricks cli otherwise command prompt will ask host and token.

      • 6a. Databricks Workspace URL:
      • Enter your workspace URL, with the format https://.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.
      • 6b. Token:
        • In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.

        • On the Access tokens tab, click Generate new token.

        • (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).

        • Click Generate.

        • Copy the displayed token

        • Paste to command prompt

    silver_fanout_workflow.png

    silver_fanout_dlt.png