Databricks dbt factory library for creating Databricks Job definition where individual models are run as separate tasks.
Project description
Databricks dbt factory
Databricks dbt Factory is a lightweight library that generates a Databricks Workflow task for each dbt model, based on your dbt manifest. It creates a DAG of tasks that run each dbt model, test, seed, and snapshot as a separate task in Databricks Workflows.
The tool can create or update tasks directly within an existing job specification such as Databricks Assets Bundle (DAB).
Table of Contents
Motivation
By default, dbt's integration with Databricks Workflows treats an entire dbt project as a single execution unit — a black box.
Databricks dbt Factory changes that by updating Databricks Workflow specs to run dbt objects (models, tests, seeds, snapshots) as individual tasks.
Benefits
✅ Simplified troubleshooting — Quickly pinpoint and fix issues at the model level.
✅ Enhanced logging & notifications — Gain detailed logs and precise error alerts for faster debugging.
✅ Improved retriability — Retry only the failed model tasks without rerunning the full project.
✅ Seamless testing — Automatically run dbt data tests on tables right after each model finishes, enabling faster validation and feedback.
How it works
The tool reads the dbt manifest file and the existing DAB workflow definition, and generates a new definition.
Installation
pip install databricks-dbt-factory
Usage
Update tasks in the existing Databricks workflow (job) definition and write new spec to job_definition_new.yaml:
databricks_dbt_factory \
--dbt-manifest-path tests/test_data/manifest.json \
--input-job-spec-path tests/test_data/job_definition_template.yaml \
--target-job-spec-path job_definition_new.yaml \
--source GIT \
--target dev
Note that --input-job-spec-path and --target-job-spec-path can be the same file, in which case the job spec is updated in place.
Arguments:
--new-job-name(type: str, optional, default: None): Optional job name. If provided, the existing job name in the job spec is updated.--dbt-manifest-path(type: str, required): Path to the dbt manifest file.--input-job-spec-path(type: str, required): Path to the input job spec file.--target-job-spec-path(type: str, required): Path to the target job spec file.--target(type: str, required): dbt target to use.--source(type: str, optional, default: None): Optional dbt project source (GITorWORKSPACE). If not provided,WORKSPACEwill be used.--warehouse_id(type: str, optional, default: None): Optional SQL Warehouse to run dbt models on.--schema(type: str, optional, default: None): Optional metastore schema (database) to use in the dbt task.--catalog(type: str, optional, default: None): Optional metastore catalog to use in the dbt task.--profiles-directory(type: str, optional, default: None): Optional (relative) path to the job profiles directory to use in the dbt task.--project-directory(type: str, optional, default: None): Optional (relative) workspace path to the dbt project directory to use in the dbt task.--environment-key(type: str, optional, default: Default): Optional (relative) key of an environment.--extra-dbt-command-options(type: str, optional, default: ""): Optional additional dbt command options to include.--run-tests(type: bool, optional, default: True): Whether to run data tests after the model. Enabled by default.--enable-dbt-deps(type: bool, optional, default: False): Whether to run dbt deps before each task. Disabled by default.--dbt-tasks-deps(type: str, optional, default: None): Optional comma separated list of tasks for which dbt deps should be run (e.g. "diamonds_prices,second_dbt_model"). Only in effect if--enable-dbt-depsis enabled.--dry-run(type: bool, optional, default: False): Print generated tasks without updating the job spec file. Disabled by default.
You can also check all input arguments by running databricks_dbt_factory --help.
Demo of the tool can be found here.
Contribution
See contribution guidance here.
License
databricks-dbt-factory is distributed under the terms of the MIT license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databricks_dbt_factory-0.1.0.tar.gz.
File metadata
- Download URL: databricks_dbt_factory-0.1.0.tar.gz
- Upload date:
- Size: 18.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a7386691d49301bd0a97a9795b575ec732e3ff5e02a67b27db5a7cde90fba42
|
|
| MD5 |
0ec954139ef5f6ef0000a2e998aad5f1
|
|
| BLAKE2b-256 |
d9271283c35b5713777f4ae8109696b2c480b4bc62290759108ed08d62babc3e
|
Provenance
The following attestation bundles were made for databricks_dbt_factory-0.1.0.tar.gz:
Publisher:
release.yml on mwojtyczka/databricks-dbt-factory
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
databricks_dbt_factory-0.1.0.tar.gz -
Subject digest:
6a7386691d49301bd0a97a9795b575ec732e3ff5e02a67b27db5a7cde90fba42 - Sigstore transparency entry: 267932628
- Sigstore integration time:
-
Permalink:
mwojtyczka/databricks-dbt-factory@6aab04fc4b74605f4961ea502cdd2c34495b1fad -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mwojtyczka
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6aab04fc4b74605f4961ea502cdd2c34495b1fad -
Trigger Event:
push
-
Statement type:
File details
Details for the file databricks_dbt_factory-0.1.0-py3-none-any.whl.
File metadata
- Download URL: databricks_dbt_factory-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f49f71029187b6260367d4c3a25ce0f6c259ed25175b104e3e4a64e6607d271d
|
|
| MD5 |
9838891b4535700955aecfb44b68c765
|
|
| BLAKE2b-256 |
a074dba0301a1e3e42f4cc9960b44ece9bdd8599cf85cab01c785f03ab9b3284
|
Provenance
The following attestation bundles were made for databricks_dbt_factory-0.1.0-py3-none-any.whl:
Publisher:
release.yml on mwojtyczka/databricks-dbt-factory
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
databricks_dbt_factory-0.1.0-py3-none-any.whl -
Subject digest:
f49f71029187b6260367d4c3a25ce0f6c259ed25175b104e3e4a64e6607d271d - Sigstore transparency entry: 267932635
- Sigstore integration time:
-
Permalink:
mwojtyczka/databricks-dbt-factory@6aab04fc4b74605f4961ea502cdd2c34495b1fad -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mwojtyczka
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6aab04fc4b74605f4961ea502cdd2c34495b1fad -
Trigger Event:
push
-
Statement type: