Skip to main content

Databricks dbt factory library for creating Databricks Job definition where individual models are run as separate tasks.

Project description

Databricks dbt factory

Databricks dbt Factory is a lightweight library that generates a Databricks Workflow task for each dbt model, based on your dbt manifest. It creates a DAG of tasks that run each dbt model, test, seed, and snapshot as a separate task in Databricks Workflows.

The tool can create or update tasks directly within an existing job specification such as Databricks Assets Bundle (DAB).

PyPI - Version PyPI - Python Version


Table of Contents

Motivation

By default, dbt's integration with Databricks Workflows treats an entire dbt project as a single execution unit — a black box.

Databricks dbt Factory changes that by updating Databricks Workflow specs to run dbt objects (models, tests, seeds, snapshots) as individual tasks.

before

Benefits

✅ Simplified troubleshooting — Quickly pinpoint and fix issues at the model level.

✅ Enhanced logging & notifications — Gain detailed logs and precise error alerts for faster debugging.

✅ Improved retriability — Retry only the failed model tasks without rerunning the full project.

✅ Seamless testing — Automatically run dbt data tests on tables right after each model finishes, enabling faster validation and feedback.

How it works

after

The tool reads the dbt manifest file and the existing DAB workflow definition, and generates a new definition.

Installation

pip install databricks-dbt-factory

Usage

Update tasks in the existing Databricks workflow (job) definition and write new spec to job_definition_new.yaml:

databricks_dbt_factory  \
  --dbt-manifest-path tests/test_data/manifest.json \
  --input-job-spec-path tests/test_data/job_definition_template.yaml \
  --target-job-spec-path job_definition_new.yaml \
  --source GIT \
  --target dev

Note that --input-job-spec-path and --target-job-spec-path can be the same file, in which case the job spec is updated in place.

Arguments:

  • --new-job-name (type: str, optional, default: None): Optional job name. If provided, the existing job name in the job spec is updated.
  • --dbt-manifest-path (type: str, required): Path to the dbt manifest file.
  • --input-job-spec-path (type: str, required): Path to the input job spec file.
  • --target-job-spec-path (type: str, required): Path to the target job spec file.
  • --target (type: str, required): dbt target to use.
  • --source (type: str, optional, default: None): Optional dbt project source (GIT or WORKSPACE). If not provided, WORKSPACE will be used.
  • --warehouse_id (type: str, optional, default: None): Optional SQL Warehouse to run dbt models on.
  • --schema (type: str, optional, default: None): Optional metastore schema (database) to use in the dbt task.
  • --catalog (type: str, optional, default: None): Optional metastore catalog to use in the dbt task.
  • --profiles-directory (type: str, optional, default: None): Optional (relative) path to the job profiles directory to use in the dbt task.
  • --project-directory (type: str, optional, default: None): Optional (relative) workspace path to the dbt project directory to use in the dbt task.
  • --environment-key (type: str, optional, default: Default): Optional (relative) key of an environment.
  • --extra-dbt-command-options (type: str, optional, default: ""): Optional additional dbt command options to include.
  • --run-tests (type: bool, optional, default: True): Whether to run data tests after the model. Enabled by default.
  • --enable-dbt-deps (type: bool, optional, default: False): Whether to run dbt deps before each task. Disabled by default.
  • --dbt-tasks-deps (type: str, optional, default: None): Optional comma separated list of tasks for which dbt deps should be run (e.g. "diamonds_prices,second_dbt_model"). Only in effect if --enable-dbt-deps is enabled.
  • --dry-run (type: bool, optional, default: False): Print generated tasks without updating the job spec file. Disabled by default.

You can also check all input arguments by running databricks_dbt_factory --help.

Demo of the tool can be found here.

Contribution

See contribution guidance here.

License

databricks-dbt-factory is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_dbt_factory-0.1.0.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databricks_dbt_factory-0.1.0-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file databricks_dbt_factory-0.1.0.tar.gz.

File metadata

  • Download URL: databricks_dbt_factory-0.1.0.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for databricks_dbt_factory-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6a7386691d49301bd0a97a9795b575ec732e3ff5e02a67b27db5a7cde90fba42
MD5 0ec954139ef5f6ef0000a2e998aad5f1
BLAKE2b-256 d9271283c35b5713777f4ae8109696b2c480b4bc62290759108ed08d62babc3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_dbt_factory-0.1.0.tar.gz:

Publisher: release.yml on mwojtyczka/databricks-dbt-factory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file databricks_dbt_factory-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for databricks_dbt_factory-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f49f71029187b6260367d4c3a25ce0f6c259ed25175b104e3e4a64e6607d271d
MD5 9838891b4535700955aecfb44b68c765
BLAKE2b-256 a074dba0301a1e3e42f4cc9960b44ece9bdd8599cf85cab01c785f03ab9b3284

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_dbt_factory-0.1.0-py3-none-any.whl:

Publisher: release.yml on mwojtyczka/databricks-dbt-factory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page