Find the longest-running critical paths through your dbt DAG so you know which models to optimize first.
Project description
dbt-dag-opt
Find the longest-running paths through your dbt DAG — the models that actually make your pipeline slow.
When you pay for compute by the second (Snowflake, Databricks, Redshift), your dbt job's wall-clock cost is bounded by the critical path through the DAG: the longest cumulative chain of model execution times. Optimizing a slow model on a short branch saves you nothing if a longer branch was already the bottleneck. dbt-dag-opt tells you which paths to cut first.
Install
pip install dbt-dag-opt
Quickstart
From local artifacts
dbt-dag-opt analyze \
--manifest target/manifest.json \
--run-results target/run_results.json \
--format table \
--top 10
From dbt Cloud
export DBT_CLOUD_TOKEN=dbtu_...
dbt-dag-opt analyze \
--account-id 12345 \
--job-id 67890 \
--base-url https://cloud.getdbt.com \
--format table
Add --run-id <id> to pull artifacts from a specific historical run instead of the job's latest.
Sample output
Longest paths by total execution time
┏━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ # ┃ Source ┃ End of path ┃ Length ┃ Total time (s) ┃
┡━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ 1 │ source.demo.raw.orders │ model.demo.fact_orders │ 4 │ 35.00 │
│ 2 │ source.demo.raw.customers │ model.demo.fact_orders │ 4 │ 32.00 │
└───┴───────────────────────────┴────────────────────────┴────────┴────────────────┘
CLI reference
analyze — critical path through the DAG
dbt-dag-opt analyze [OPTIONS]
--manifest PATH Path to manifest.json (file mode)
--run-results PATH Path to run_results.json (file mode)
--account-id TEXT dbt Cloud account id (cloud mode)
--job-id TEXT dbt Cloud job id (cloud mode)
--run-id TEXT dbt Cloud run id; omit for the job's latest run
--base-url TEXT dbt Cloud base URL [default: https://cloud.getdbt.com]
--token TEXT dbt Cloud API token [env: DBT_CLOUD_TOKEN]
-f, --format [json|jsonl|table] Output format [default: table]
-n, --top INTEGER Show only top N paths (0 = all) [default: 10]
--show-path Render the full chain of node ids (table format)
-o, --output PATH Write output to a file instead of stdout
The table includes a Bottleneck column that names the slowest model on each path. First-order optimization target: the bottleneck model on the #1 path. Watch for a bottleneck that repeats across multiple paths — that's shared-node leverage (optimizing one model helps several paths at once).
replay — what actually happened
analyze is theoretical — it reports the DAG-structural lower bound on wall-clock. replay reads the observed schedule. Every result in run_results.json carries a thread_id and per-phase timing with start/end timestamps, so we can reconstruct:
- Per-thread utilization — how much of the run each worker was busy vs. idle.
- Observed critical path — the chain of nodes that actually determined wall-clock, walked backwards from the last-completing node.
- Idle-gap attribution — for every stretch of idle time, which upstream node's completion unblocked the thread. Gaps with no blocker are scheduler overhead, not DAG blocking.
dbt-dag-opt replay [OPTIONS]
--manifest PATH Path to manifest.json (file mode)
--run-results PATH Path to run_results.json (file mode)
--account-id / --job-id dbt Cloud mode (same as analyze)
-f, --format [text|json] Output format [default: text]
--top-idle-gaps INTEGER How many idle gaps to surface [default: 10]
-o, --output PATH Write output to a file instead of stdout
Output formats
table— rich terminal table (default foranalyze).text— rich-rendered summary (default forreplay): run summary, per-thread utilization, observed critical path, top idle gaps.json—analyzeemits{source_id: {path, distance, length}};replayemits the full replay report. Both arejq-friendly.jsonl— one JSON object per line (analyzeonly).
How it works
- Load
manifest.jsonandrun_results.json(from disk or dbt Cloud's Admin API). - Build a weighted DAG: nodes are
model.*/source.*/seed.*/snapshot.*ids; each node's weight is itsexecution_timein seconds. - Compute the longest path from each source using an iterative DP over topological order (O(V + E)).
- Sort paths by total distance and surface the heaviest ones.
Distances sum the execution time of every node along the path — that's the warehouse-seconds you'd save by zeroing out that chain.
What this is / isn't
It is a CLI tool that points at the slowest chains in your DAG and — as of replay — the observed schedule that those chains actually produced.
It isn't (yet):
- A predictive scheduler simulator.
replayreconstructs what already happened; it doesn't yet project what would happen under a different--threads Nor if you sped up a specific model. That "what-if" loop is planned next. - A cost model. Multiplying wall-clock × your warehouse rate is on you — a
--warehouse-sizeflag is planned alongside the what-if loop.
Development
uv sync --all-extras
uv run ruff check .
uv run mypy src
uv run pytest
License
Apache 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_dag_opt-0.1.0.tar.gz.
File metadata
- Download URL: dbt_dag_opt-0.1.0.tar.gz
- Upload date:
- Size: 115.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f936b10a52c1ee16e75272be0904c9d7398fc2b7df7cdc65b254be807a42765
|
|
| MD5 |
f56e34ebe720c22ab9ec8399bdb78ded
|
|
| BLAKE2b-256 |
404afc17c88a0aa7a72560f4f5cbe50fd8cdc415aea845016010eb7a67b634ae
|
Provenance
The following attestation bundles were made for dbt_dag_opt-0.1.0.tar.gz:
Publisher:
publish.yml on trouze/dbt-dag-opt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbt_dag_opt-0.1.0.tar.gz -
Subject digest:
2f936b10a52c1ee16e75272be0904c9d7398fc2b7df7cdc65b254be807a42765 - Sigstore transparency entry: 1374573652
- Sigstore integration time:
-
Permalink:
trouze/dbt-dag-opt@e35072a76a2efbb35f0327ab9f1378fa13c1ea45 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/trouze
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e35072a76a2efbb35f0327ab9f1378fa13c1ea45 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dbt_dag_opt-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dbt_dag_opt-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1b4aa2e5af2bd44711e429a2676128fa8b51102bf69bb33c492edf04f1866c9
|
|
| MD5 |
b3af95c346d0be2dbf85c4d1e54ed68b
|
|
| BLAKE2b-256 |
d033ce4dd5fa04b5b2dc2ee48e81339e62612376167ca8ffbfd675dfdd6237a9
|
Provenance
The following attestation bundles were made for dbt_dag_opt-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on trouze/dbt-dag-opt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbt_dag_opt-0.1.0-py3-none-any.whl -
Subject digest:
f1b4aa2e5af2bd44711e429a2676128fa8b51102bf69bb33c492edf04f1866c9 - Sigstore transparency entry: 1374573742
- Sigstore integration time:
-
Permalink:
trouze/dbt-dag-opt@e35072a76a2efbb35f0327ab9f1378fa13c1ea45 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/trouze
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e35072a76a2efbb35f0327ab9f1378fa13c1ea45 -
Trigger Event:
push
-
Statement type: