Push dbt lineage to Databricks Unity Catalog
Project description
Push dbt lineage to Databricks Unity Catalog
[!WARNING] Unity Catalog External Lineage is in Public Preview (as of January 2026). The API may change. We'll track updates and maintain compatibility.
Why This Exists
Unity Catalog shows lineage for what happens inside Databricks. But the full picture — where data comes from, where it goes — often lives somewhere else. A data catalog. A wiki page. Someone's head.
Maybe you have Atlan or Alation with complete lineage. Great. But when you're in the Catalog Explorer tracing an issue, you don't want to context-switch — different app, re-authenticate, find the same table in a different UI. The information should be here, where you're already working.
And if Unity Catalog is your data catalog? Then this is the only place that information can live.
dbt already knows your sources and exposures. This tool pushes that metadata to Unity Catalog — so the lineage is complete, in one place, where you're already looking.
Installation
pip install dbt-unity-lineage
Quick Start
1. Initialize configuration
dbt-unity-lineage init
This creates models/lineage/unity_lineage.yml:
version: 1
project:
name: my_project
configuration:
layers:
bronze:
sources:
folders:
- models/bronze/erp
- models/bronze/crm
gold:
exposures:
folders:
- models/gold/dashboards
source_systems:
sap_ecc:
system_type: SAP
description: SAP ECC Production
2. Tag your sources
# models/bronze/erp/_sources.yml
sources:
- name: erp
meta:
uc_source: sap_ecc # ← References source_systems key
tables:
- name: gl_accounts
- name: cost_centers
3. Validate and push
# Validate configuration
dbt-unity-lineage validate
# Scan folders to see what would be pushed
dbt-unity-lineage scan
# Push to Unity Catalog
dbt-unity-lineage push
That's it. Check your lineage in Databricks Catalog Explorer.
Layer Flexibility
The configuration uses layers to organize where you scan for sources and exposures. Layer names are completely freeform — use whatever fits your architecture.
| Pattern | Example Layers |
|---|---|
| Medallion | bronze, silver, gold |
| Classic DWH | PSA, DWH, DM |
| dbt convention | staging, intermediate, marts |
| Simple | raw, analytics |
You can define 1 layer or 50 — whatever matches your project structure. Each layer can have sources, exposures, or both.
# Two layers? Fine.
configuration:
layers:
raw:
sources:
folders: [models/raw]
analytics:
exposures:
folders: [models/analytics]
# Ten layers? Also fine.
Our examples use medallion architecture (bronze/silver/gold) because it's common, but the tool doesn't care — it just scans the folders you point it at.
Exposures: Zero Config
Exposures are scanned from your configured folders. No additional tagging needed.
# models/gold/dashboards/_exposures.yml
exposures:
- name: executive_dashboard
type: dashboard
url: https://app.powerbi.com/groups/abc/reports/xyz
depends_on:
- ref('fct_orders')
The tool automatically:
- Infers
system_type: POWER_BIfrom the URL - Creates external metadata in Unity Catalog
- Links it to your gold tables
CLI Commands
# Initialize configuration file
dbt-unity-lineage init
# Validate configuration and folder structure
dbt-unity-lineage validate
# Scan folders and show what would be synced
dbt-unity-lineage scan
# Show current status (local vs remote)
dbt-unity-lineage status
# Push sources and exposures to Unity Catalog
dbt-unity-lineage push
# Preview changes without executing
dbt-unity-lineage push --dry-run
# Remove all project objects from Unity Catalog
dbt-unity-lineage clean
Output Formats
# Default table format
dbt-unity-lineage scan
# JSON for programmatic use
dbt-unity-lineage scan --format json
# Markdown for CI/CD summaries
dbt-unity-lineage status --format md >> $GITHUB_STEP_SUMMARY
Global Options
--project-dir PATH # Path to dbt project directory
--profiles-dir PATH # Path to profiles directory
--profile NAME # dbt profile name
--target NAME # dbt target name
--verbose # Enable verbose output
--quiet # Suppress non-essential output
--claude # Output Claude AI context (CLAUDE.md)
Configuration
Minimal
version: 1
project:
name: my_project
configuration:
layers:
sources:
sources:
folders:
- models/sources
Full Example (Medallion Architecture)
version: 1
project:
name: enterprise_analytics
configuration:
validation:
require_owner: ${REQUIRE_OWNER:-false}
require_source_system: ${REQUIRE_SOURCE_SYSTEM:-true}
layers:
bronze:
sources:
folders:
- bronze/erp
- bronze/crm
- bronze/marketing
exposures:
folders:
- bronze/data_quality
silver:
sources:
folders:
- silver/delta_shares
- silver/partner_feeds
gold:
exposures:
folders:
- gold/dashboards
- gold/reports
source_systems:
sap_s4:
system_type: SAP
description: SAP S/4HANA Finance & Logistics
owner: erp-team@example.com
url: https://sap.example.com
table_lineage: true
meta_columns:
- _loaded_at
- _load_process
- _record_source
meta:
environment: production
data_classification: confidential
salesforce_prod:
system_type: Salesforce
table_lineage: true
meta_columns:
- _extracted_at
- _sync_id
hubspot:
system_type: HubSpot
description: HubSpot Marketing Hub
owner: marketing-ops@example.com
customer_360_share:
system_type: Delta Share
description: Customer domain data product
owner: customer-domain@example.com
Table-Level Lineage (Column Tracking)
For sources where bronze is a 1:1 copy of the upstream system, you can opt into column-level lineage as you see on the full example above.
This creates external table entities in Unity Catalog (e.g., sap_s4.gl_accounts) Maps columns 1:1 from source to bronze (assumes identical names) Excludes meta_columns from lineage — these are typically metadata columns added by your pipeline, not present in the source
This is optional, and as of yet actaully not supported by the Databricks API (according to their API Documentation).
[!NOTE] Column-level lineage is configured and ready, but the Unity Catalog API does not yet support it (as of January 2026). Currently this detail is only visible within the Unity Catalog UI. When the API adds support, this feature will work without config changes.
Configuration Options
| Section | Field | Required | Description |
|---|---|---|---|
project |
name |
Yes | Project identifier for tagging in Unity Catalog |
configuration.validation |
require_owner |
No | Require owner on sources (default: false) |
require_description |
No | Require description on sources (default: false) | |
require_source_system |
No | Require uc_source meta on all sources (default: false) |
|
configuration.layers.{name} |
sources.folders |
No* | Folders to scan for sources |
exposures.folders |
No* | Folders to scan for exposures | |
source_systems.{key} |
system_type |
Yes | Type of system (SAP, Salesforce, etc.) |
description |
No | Human-readable description | |
owner |
No | Contact email | |
url |
No | URL to source system | |
table_lineage |
No | Enable table-level lineage (default: false) | |
meta_columns |
No | Columns to exclude from lineage | |
meta |
No | Custom key-value properties |
*Each layer needs at least one of sources or exposures.
Environment Variables
Validation settings support ENV vars for CI/CD flexibility:
configuration:
validation:
require_owner: ${REQUIRE_OWNER:-false}
require_source_system: ${REQUIRE_SOURCE_SYSTEM:-true}
# Local dev - permissive
dbt-unity-lineage push
# CI/CD - strict
REQUIRE_OWNER=true REQUIRE_SOURCE_SYSTEM=true dbt-unity-lineage push
CI/CD Integration
- name: Validate lineage config
run: dbt-unity-lineage validate --format md >> $GITHUB_STEP_SUMMARY
- name: Push lineage
run: dbt-unity-lineage push --target prod
env:
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
- name: Post status to PR
run: dbt-unity-lineage status --format md >> $GITHUB_STEP_SUMMARY
How It Works
Ownership Tracking
Every object created includes ownership metadata:
{
"properties": {
"dbt_unity_lineage_managed": "true",
"dbt_unity_lineage_project": "my_project"
}
}
This ensures safe updates, multi-project support, and clean removal of orphaned objects.
Idempotent Pushes
Run push as many times as you want:
- New objects are created
- Changed objects are updated
- Removed objects are deleted
- Objects from other tools/projects are ignored
System Type Detection
For exposures, system type is inferred from URL:
| URL Contains | System Type |
|---|---|
powerbi.com |
POWER_BI |
tableau.com |
TABLEAU |
looker.com |
LOOKER |
salesforce.com |
SALESFORCE |
For sources, common aliases are normalized:
| Input | Normalized |
|---|---|
sap, sap_ecc, sap_hana |
SAP |
salesforce, sfdc |
SALESFORCE |
postgresql, postgres |
POSTGRESQL |
powerbi, power_bi |
POWER_BI |
Unknown values default to CUSTOM.
Required Permissions
Your Databricks service principal needs:
| Permission | Scope | Purpose |
|---|---|---|
CREATE EXTERNAL METADATA |
Metastore | Create objects |
MODIFY |
External metadata | Update/delete |
Profile Configuration
The tool reads connection details from your dbt profiles.yml:
my_project:
target: prod
outputs:
prod:
type: databricks
host: dbc-abc123.cloud.databricks.com
token: "{{ env_var('DATABRICKS_TOKEN') }}"
catalog: main
Related Projects
Contributing
Contributions welcome! Please read our contributing guidelines.
License
Lineage that stops at bronze isn't lineage — it's the middle of the story.
Built with the belief that lineage shouldn't stop at your warehouse boundary.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_unity_lineage-0.3.1.tar.gz.
File metadata
- Download URL: dbt_unity_lineage-0.3.1.tar.gz
- Upload date:
- Size: 66.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf07d48ee0c8a0734856d83b98e6511420fb558568e74b264da0286ade052c40
|
|
| MD5 |
3d7b394722d590b592a449877266bd7a
|
|
| BLAKE2b-256 |
39112374184ce4e285a82bdc148163ca6ee2dcb5855d41cac92a70b36391e479
|
File details
Details for the file dbt_unity_lineage-0.3.1-py3-none-any.whl.
File metadata
- Download URL: dbt_unity_lineage-0.3.1-py3-none-any.whl
- Upload date:
- Size: 31.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ddf527f70ed69316a3800015835965980fb27d561ad025484dbafdb3ab7ac1d
|
|
| MD5 |
a8bb5db8bbb8a8f25eff31e11ba47f01
|
|
| BLAKE2b-256 |
64fd31e51ddc96b1c512653e98e53d5180379e609e8429f942ddf89f0c44554a
|