DLT-META DEMO's
- DAIS 2023 DEMO: Showcases DLT-META's capabilities of creating Bronze and Silver DLT pipelines with initial and incremental mode automatically.
- Databricks Techsummit Demo: 100s of data sources ingestion in bronze and silver DLT pipelines automatically.
- Append FLOW Autoloader Demo: Write to same target from multiple sources using dlt.append_flow and adding File metadata column
- Append FLOW Eventhub Demo: Write to same target from multiple sources using dlt.append_flow and adding File metadata column
- Silver Fanout Demo: This demo showcases the implementation of fanout architecture in the silver layer.
This Demo launches Bronze and Silver DLT pipelines with following activities:
- Customer and Transactions feeds for initial load
- Adds new feeds Product and Stores to existing Bronze and Silver DLT pipelines with metadata changes.
- Runs Bronze and Silver DLT for incremental load for CDC events
-
Launch Terminal/Command prompt
-
Install Databricks CLI
-
git clone https://github.com/databrickslabs/dlt-meta.git
-
cd dlt-meta
-
Set python environment variable into terminal
dlt_meta_home=$(pwd)
export PYTHONPATH=$dlt_meta_home
-
Run the command
python demo/launch_dais_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-demo-automated
-
cloud_provider_name : aws or azure or gcp
-
db_version : Databricks Runtime Version
-
dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
-
you can provide
--profile=databricks_profile name
in case you already have databricks cli otherwise command prompt will ask host and token. -
- 6a. Databricks Workspace URL:
-
- Enter your workspace URL, with the format https://.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.
-
- 6b. Token:
-
In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.
-
On the Access tokens tab, click Generate new token.
-
(Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).
-
Click Generate.
-
Copy the displayed token
-
Paste to command prompt
-
- 6b. Token:
-
This demo will launch auto generated tables(100s) inside single bronze and silver DLT pipeline using dlt-meta.
-
Launch Terminal/Command promt
-
Install Databricks CLI
-
git clone https://github.com/databrickslabs/dlt-meta.git
-
cd dlt-meta
-
Set python environment variable into terminal
dlt_meta_home=$(pwd)
export PYTHONPATH=$dlt_meta_home
-
Run the command
python demo/launch_techsummit_demo.py --source=cloudfiles --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/techsummit-dlt-meta-demo-automated
-
cloud_provider_name : aws or azure
-
db_version : Databricks Runtime Version
-
dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
-
you can provide
--profile=databricks_profile name
in case you already have databricks cli otherwise command prompt will ask host and token -
- 6a. Databricks Workspace URL:
- Enter your workspace URL, with the format https://.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.
- 6a. Databricks Workspace URL:
-
- 6b. Token:
-
In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.
-
On the Access tokens tab, click Generate new token.
-
(Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).
-
Click Generate.
-
Copy the displayed token
-
Paste to command prompt
-
- 6b. Token:
-
This demo will perform following tasks:
- Read from different source paths using autoloader and write to same target using append_flow API
- Read from different delta tables and write to same silver table using append_flow API
- Add file_name and file_path to target bronze table for autoloader source using File metadata column
-
Launch Terminal/Command prompt
-
Install Databricks CLI
-
git clone https://github.com/databrickslabs/dlt-meta.git
-
cd dlt-meta
-
Set python environment variable into terminal
dlt_meta_home=$(pwd)
export PYTHONPATH=$dlt_meta_home
-
python demo/launch_af_cloudfiles_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=ravi_dlt_meta_uc
- cloud_provider_name : aws or azure or gcp
- db_version : Databricks Runtime Version
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
- uc_catalog_name: Unity catalog name
- you can provide
--profile=databricks_profile name
in case you already have databricks cli otherwise command prompt will ask host and token
- Read from different eventhub topics and write to same target tables using append_flow API
-
Launch Terminal/Command prompt
-
Install Databricks CLI
-
git clone https://github.com/databrickslabs/dlt-meta.git
-
cd dlt-meta
-
Set python environment variable into terminal
dlt_meta_home=$(pwd)
export PYTHONPATH=$dlt_meta_home
-
Eventhub
-
Needs eventhub instance running
-
Need two eventhub topics first for main feed (eventhub_name) and second for append flow feed (eventhub_name_append_flow)
-
Create databricks secrets scope for eventhub keys
-
commandline databricks secrets create-scope eventhubs_dltmeta_creds
-
databricks secrets put-secret --json '{ "scope": "eventhubs_dltmeta_creds", "key": "RootManageSharedAccessKey", "string_value": "<<value>>" }'
-
-
Create databricks secrets to store producer and consumer keys using the scope created in step 2
-
Following are the mandatory arguments for running EventHubs demo
- cloud_provider_name: Cloud provider name e.g. aws or azure
- dbr_version: Databricks Runtime Version e.g. 15.3.x-scala2.12
- uc_catalog_name : unity catalog name e.g. ravi_dlt_meta_uc
- dbfs_path: Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines e.g. dbfs:/tmp/DLT-META/demo/
- eventhub_namespace: Eventhub namespace e.g. dltmeta
- eventhub_name : Primary Eventhubname e.g. dltmeta_demo
- eventhub_name_append_flow: Secondary eventhub name for appendflow feed e.g. dltmeta_demo_af
- eventhub_producer_accesskey_name: Producer databricks access keyname e.g. RootManageSharedAccessKey
- eventhub_consumer_accesskey_name: Consumer databricks access keyname e.g. RootManageSharedAccessKey
- eventhub_secrets_scope_name: Databricks secret scope name e.g. eventhubs_dltmeta_creds
- eventhub_port: Eventhub port
-
python3 demo/launch_af_eventhub_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=ravi_dlt_meta_uc --eventhub_name=dltmeta_demo --eventhub_name_append_flow=dltmeta_demo_af --eventhub_secrets_scope_name=dltmeta_eventhub_creds --eventhub_namespace=dltmeta --eventhub_port=9093 --eventhub_producer_accesskey_name=RootManageSharedAccessKey --eventhub_consumer_accesskey_name=RootManageSharedAccessKey --eventhub_accesskey_secret_name=RootManageSharedAccessKey --uc_catalog_name=ravi_dlt_meta_uc
- This demo will showcase the onboarding process for the silver fanout pattern.
- Run the onboarding process for the bronze cars table, which contains data from various countries.
- Run the onboarding process for the silver tables, which have a
where_clause
based on the country condition specified in silver_transformations_cars.json. - Run the Bronze DLT pipeline which will produce cars table.
- Run Silver DLT pipeline, fanning out from the bronze cars table to country-specific tables such as cars_usa, cars_uk, cars_germany, and cars_japan.
-
Launch Terminal/Command prompt
-
Install Databricks CLI
-
git clone https://github.com/databrickslabs/dlt-meta.git
-
cd dlt-meta
-
Set python environment variable into terminal
dlt_meta_home=$(pwd)
export PYTHONPATH=$dlt_meta_home
-
Run the command
python demo/launch_silver_fanout_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-silver-fanout
-
cloud_provider_name : aws or azure
-
db_version : Databricks Runtime Version
-
dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
-
you can provide
--profile=databricks_profile name
in case you already have databricks cli otherwise command prompt will ask host and token. -
- 6a. Databricks Workspace URL:
-
- Enter your workspace URL, with the format https://.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.
-
- 6b. Token:
-
In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.
-
On the Access tokens tab, click Generate new token.
-
(Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).
-
Click Generate.
-
Copy the displayed token
-
Paste to command prompt
-
- 6b. Token:
-