dbt-mp : dbt manifest parser for agentic context
Project description
dbt Manifest Parser (dbt-mp)
A focused, lightweight CLI tool for parsing and filtering dbt manifest.json files to create context-optimized artifacts for AI agents and developer onboarding.
Why This Exists
Navigating large, complex dbt projects with hundreds of models can be a significant challenge. While dbt's lineage is powerful at the model level, understanding the intricate dependencies within deeply nested CTEs often requires manual, time-consuming code tracing. This complexity creates a major hurdle for both onboarding new developers and for leveraging AI agents to assist with code analysis, as the full manifest.json file is often too large and noisy for effective use in LLM contexts.
dbt-mp was built to solve this problem. It bridges the gap between the high-level view of dbt ls and the overwhelming detail of the full manifest. By intelligently selecting a target model and its direct lineage, and then filtering the manifest to include only the most critical attributes, it generates a concise, token-optimized JSON artifact.
The goal: To make interacting with large dbt projects more efficient for both humans and AI, accelerating development, and simplifying the process of understanding complex data transformations.
What It Does
dbt-mp is a command-line tool that performs a two-step process:
- Select & Compile: It first invokes
dbt lswith your specified model selector (e.g.,+my_model) to compile your project and generate a freshmanifest.json. This ensures the artifact is always up-to-date with your current code. - Parse & Filter: It then parses the newly generated manifest, extracting only the selected models, their direct parents, and any associated macros. It intelligently slims down the JSON, keeping high-signal attributes while discarding less relevant data to optimize for token count.
This produces a hyper-focused JSON file, perfect for:
- Providing as context to an AI agent for code refactoring or analysis.
- Including in a Pull Request to give reviewers a clear picture of the changes.
- Speeding up the onboarding process for developers new to the project.
Benchmark: Performance & Token Reduction
To demonstrate the effectiveness of dbt-mp, we ran it on the standard dbt Labs' jaffle_shop project, which contains approximately 20 models. The results show a significant reduction in the size of the manifest, making it far more suitable for AI agent contexts.
| Metric | Raw manifest.json |
dbt-mp Slim Manifest |
Reduction |
|---|---|---|---|
| Tokens | ~343,000 | ~8,800 | ~97% |
| Lines | ~21,000 | ~450 | ~98% |
This dramatic decrease in size allows for a much more focused and efficient analysis by both developers and LLM-based tools.
Installation
The tool is packaged and distributed via PyPI.
# Via pip
pip install dbt-mp
# Or run as a one-off executable via uv
uvx dbt-mp --help
Usage
To use the tool, run the dbt-mp command from the root of your dbt project directory. The most common use case is to provide a dbt model selector and an output file path.
Example:
The following command will select the model stg_orders, its parents (+), and its children (+), then generate a filtered manifest.
dbt-mp --select '+stg_orders+' --out-file filtered_manifest.json
The resulting slim_manifest.json will contain a lean, context-rich representation of the selected slice of your dbt project.
Core Attributes for Contextual Quality
dbt-mp optimizes the manifest by preserving a curated set of high-signal attributes that balance context quality with token economy. The following keys are retained:
nodes
| Attribute | Rationale |
|---|---|
schema, name, resource_type |
Basic identifiers for the node. |
unique_id |
The canonical, unique identifier within the dbt graph. |
config (subset) |
Key configuration like materialized and enabled are crucial for understanding behavior. |
tags, columns |
Metadata and column-level descriptions provide essential semantic context. |
raw_code, compiled_code |
The original and compiled SQL are the most critical assets for code analysis. |
refs, sources, depends_on |
The explicit dependency graph is fundamental for lineage tracing. |
sources
| Attribute | Rationale |
|---|---|
database, schema, name |
Identifiers for the source table. |
unique_id |
The canonical identifier for the source. |
description |
Semantic context for what the source represents. |
macros
| Attribute | Rationale |
|---|---|
unique_id |
The canonical identifier for the macro. |
macro_sql |
The macro's code is essential, as it's injected into model SQL. |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_mp-0.1.4.tar.gz.
File metadata
- Download URL: dbt_mp-0.1.4.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aaf0d1647928cd2ab760173c9f806077cc6883560a9da4d2e5ea32f138d8e4e6
|
|
| MD5 |
b1ae2d9926367d35a3a4e53c7e790b67
|
|
| BLAKE2b-256 |
599f312fa9cb23f92cebda2a5e825f596db2d8546de67a52e5d3f93e339c6774
|
File details
Details for the file dbt_mp-0.1.4-py3-none-any.whl.
File metadata
- Download URL: dbt_mp-0.1.4-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b07ba4b1d9bdf7275c743f972877a9f0e41c6480b1fe39bab9d4740d1a6a6ecc
|
|
| MD5 |
17316b9c9b3e03beed8033036cae69df
|
|
| BLAKE2b-256 |
0530519c39f3535dc33a57c9e957c1fed9188e7bb54d175af324ea0dddd4af1d
|