Skip to main content

Push dbt lineage to Databricks Unity Catalog

Project description

dbt-unity-lineage

Push dbt lineage to Databricks Unity Catalog

PyPI version Python versions License: MIT CI codecov


The Problem

Unity Catalog automatically captures lineage for transformations that run inside Databricks. But it can't see:

  • Where data comes from — SAP, Salesforce, PostgreSQL, APIs, etc.
  • Where data goes — Power BI dashboards, Tableau reports, applications, etc.

You're left with a gap in your lineage view:

[???] → Bronze → Silver → Gold → [???]

The Solution

dbt already knows this information:

  • Sources define upstream systems
  • Exposures define downstream consumers

dbt-unity-lineage reads your dbt metadata and pushes it to Unity Catalog:

[SAP] → Bronze → Silver → Gold → [Power BI]
   ↑                                  ↑
   └── dbt-unity-lineage pushes ──────┘

Installation

pip install dbt-unity-lineage

Quick Start

1. Create a config file

# dbt_unity_lineage.yml
version: 1

source_systems:
  sap_ecc:
    system_type: SAP
    description: SAP ECC Production

  salesforce_prod:
    system_type: Salesforce
    description: Salesforce Sales Cloud

source_paths:
  - bronze_erp
  - bronze_crm

2. Tag your sources

# models/bronze_erp/_sources.yml
sources:
  - name: erp
    meta:
      uc_source: sap_ecc      # ← Just this tag
    tables:
      - name: gl_accounts
      - name: cost_centers

3. Push to Unity Catalog

dbt build
dbt-unity-lineage push

That's it. Check your lineage in Databricks Catalog Explorer.

Exposures: Zero Config

Exposures are read directly from manifest.json. No additional configuration needed.

# models/marts/exposures.yml
exposures:
  - name: executive_dashboard
    type: dashboard
    url: https://app.powerbi.com/groups/abc/reports/xyz
    depends_on:
      - ref('fct_orders')

The tool automatically:

  • Infers system_type: POWER_BI from the URL
  • Creates external metadata in Unity Catalog
  • Links it to your gold tables

CLI Commands

# Push sources and exposures to Unity Catalog
dbt-unity-lineage push

# Preview changes without executing
dbt-unity-lineage push --dry-run

# Show current status (local vs remote)
dbt-unity-lineage status

# Show status in markdown (great for CI/CD)
dbt-unity-lineage status --format md

# Remove orphaned objects
dbt-unity-lineage clean

dbt Cloud Integration

Fetch manifest directly from dbt Cloud instead of requiring a local file:

# Using job ID (fetches latest successful run)
dbt-unity-lineage push \
  --dbt-cloud \
  --dbt-cloud-account-id 12345 \
  --dbt-cloud-job-id 67890

# Using run ID (fetches from specific run)
dbt-unity-lineage push \
  --dbt-cloud \
  --dbt-cloud-run-id 98765

# With environment variables
export DBT_CLOUD_TOKEN=dbtu_xxx
export DBT_CLOUD_ACCOUNT_ID=12345
dbt-unity-lineage push --dbt-cloud --dbt-cloud-job-id 67890

Global Options

--config PATH          # Path to dbt_unity_lineage.yml
--manifest PATH        # Path to manifest.json
--project-dir PATH     # Path to dbt project directory
--profile NAME         # dbt profile name
--target NAME          # dbt target name
--verbose              # Enable verbose output
--quiet                # Suppress non-essential output
--claude               # Output Claude AI context (CLAUDE.md)

Claude AI Context

Output version-matched context for Claude AI to understand your dbt-unity-lineage setup:

# Append to your project's CLAUDE.md
dbt-unity-lineage --claude >> CLAUDE.md

# Or to a .claude directory
dbt-unity-lineage --claude >> .claude/CLAUDE.md

This fetches the CLAUDE.md file from GitHub matching your installed version, providing Claude with context about available commands, configuration options, and common patterns.

Configuration Reference

dbt_unity_lineage.yml

version: 1

# Define your source systems
source_systems:
  sap_ecc:
    system_type: SAP                    # Required: UC system type
    entity_type: table                  # Optional: defaults to "table"
    description: SAP ECC Production     # Optional
    url: https://sap.example.com        # Optional
    owner: erp-team@example.com         # Optional
    properties:                         # Optional: custom properties
      environment: production

# Folders to scan for sources (relative to models/)
source_paths:
  - bronze_erp
  - bronze_crm

# Optional settings
settings:
  batch_size: 50                        # API batch size
  strict: false                         # Error on unmapped sources

Source Tagging

In your sources.yml or schema.yml:

sources:
  - name: erp
    meta:
      uc_source: sap_ecc    # References source_systems key
    tables:
      - name: gl_accounts

Exposure Overrides

Exposures work automatically, but you can override the system type:

exposures:
  - name: my_dashboard
    type: dashboard
    url: https://custom-bi-tool.example.com/dashboard/123
    meta:
      uc_system_type: CUSTOM    # Override auto-detection

Supported System Types

The tool normalizes common variations and supports all Unity Catalog system types:

Input Normalized
sap, sap_ecc, sap_hana SAP
salesforce, sfdc SALESFORCE
postgresql, postgres POSTGRESQL
sql_server, mssql MICROSOFT_SQL_SERVER
bigquery, bq GOOGLE_BIGQUERY
powerbi, power_bi POWER_BI
(and more...)

Unknown values default to CUSTOM.

URL Auto-Detection

For exposures, system type is automatically detected from URLs:

URL Contains System Type
powerbi.com POWER_BI
tableau.com TABLEAU
looker.com LOOKER
salesforce.com SALESFORCE

CI/CD Integration

GitHub Actions

- name: Push lineage
  run: dbt-unity-lineage push --target prod
  env:
    DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}

- name: Post status to PR
  run: dbt-unity-lineage status --format md >> $GITHUB_STEP_SUMMARY

Status Output (Markdown)

## dbt-unity-lineage Status

| Source | System | Status |
|--------|--------|--------|
| sap_ecc.gl_accounts | SAP | ✅ In sync |
| workday.employees | Workday | 🆕 Create |

| Exposure | System | Status |
|----------|--------|--------|
| executive_dashboard | Power BI | ✅ In sync |

How It Works

Ownership Tracking

Every object created by this tool includes ownership metadata:

{
  "properties": {
    "managed_by": "dbt-unity-lineage",
    "dbt_project": "my_project"
  }
}

This ensures:

  • Safe updates — Only modifies objects it created
  • Multi-project support — Projects don't interfere with each other
  • Clean removal — Orphaned objects are tracked and removable

Idempotent Pushes

Run push as many times as you want:

  • New objects are created
  • Changed objects are updated
  • Removed objects are deleted
  • Objects from other tools/projects are ignored

Required Permissions

Your Databricks service principal needs:

Permission Scope Purpose
CREATE EXTERNAL METADATA Metastore Create objects
MODIFY External metadata Update/delete

Important Notes

Unity Catalog External Lineage is in Public Preview

As of January 2026, this feature is in Public Preview. The API may change. We'll track updates and maintain compatibility.

Profile Configuration

The tool reads connection details from your dbt profiles.yml:

my_project:
  target: prod
  outputs:
    prod:
      type: databricks
      host: dbc-abc123.cloud.databricks.com
      token: "{{ env_var('DATABRICKS_TOKEN') }}"
      catalog: main

Related Projects

dbt-conceptual   dbt-source-simulator

Contributing

Contributions welcome! Please read our contributing guidelines.

License

MIT


Built with the belief that lineage shouldn't stop at your warehouse boundary.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_unity_lineage-0.2.0.tar.gz (71.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_unity_lineage-0.2.0-py3-none-any.whl (34.5 kB view details)

Uploaded Python 3

File details

Details for the file dbt_unity_lineage-0.2.0.tar.gz.

File metadata

  • Download URL: dbt_unity_lineage-0.2.0.tar.gz
  • Upload date:
  • Size: 71.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dbt_unity_lineage-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ac26385e0802164a2a243228b9f71d9af707293fbb4bec74d39a19f94aef6a12
MD5 4b4a90a28b6b6ebffabee5d62dcb84a6
BLAKE2b-256 3fd0ee61b648f95eb94ee4baae3f98c550a9a5aa339cff3fb0415764dca8f5ca

See more details on using hashes here.

File details

Details for the file dbt_unity_lineage-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt_unity_lineage-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c83de1f89f98295d0e373db1f3c2b3f4f7be58acba4232d40ab139991e15bc2a
MD5 f9b7109de0b6ded1672a0da10a3a264d
BLAKE2b-256 76e58bfcb430d8c6bc0a94c5119afa81c92fcddb2d458f3840f1613f9c4f049e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page