Skip to main content

dbt-mp : dbt manifest parser for agentic context

Project description

dbt Manifest Parser (dbt-mp)

A focused, lightweight CLI tool for parsing and filtering dbt manifest.json files to create context-optimized artifacts for AI agents and developer onboarding.


Why This Exists

Navigating large, complex dbt projects with hundreds of models can be a significant challenge. While dbt's lineage is powerful at the model level, understanding the intricate dependencies within deeply nested CTEs often requires manual, time-consuming code tracing. This complexity creates a major hurdle for both onboarding new developers and for leveraging AI agents to assist with code analysis, as the full manifest.json file is often too large and noisy for effective use in LLM contexts.

dbt-mp was built to solve this problem. It bridges the gap between the high-level view of dbt ls and the overwhelming detail of the full manifest. By intelligently selecting a target model and its direct lineage, and then filtering the manifest to include only the most critical attributes, it generates a concise, token-optimized JSON artifact.

The goal: To make interacting with large dbt projects more efficient for both humans and AI, accelerating development, and simplifying the process of understanding complex data transformations.


What It Does

dbt-mp is a command-line tool that performs a two-step process:

  1. Select & Compile: It first invokes dbt ls with your specified model selector (e.g., +my_model) to compile your project and generate a fresh manifest.json. This ensures the artifact is always up-to-date with your current code.
  2. Parse & Filter: It then parses the newly generated manifest, extracting only the selected models, their direct parents, and any associated macros. It intelligently slims down the JSON, keeping high-signal attributes while discarding less relevant data to optimize for token count.

This produces a hyper-focused JSON file, perfect for:

  • Providing as context to an AI agent for code refactoring or analysis.
  • Including in a Pull Request to give reviewers a clear picture of the changes.
  • Speeding up the onboarding process for developers new to the project.

Benchmark: Performance & Token Reduction

To demonstrate the effectiveness of dbt-mp, we ran it on the standard dbt Labs' jaffle_shop project, which contains approximately 20 models. The results show a significant reduction in the size of the manifest, making it far more suitable for AI agent contexts.

Metric Raw manifest.json dbt-mp Slim Manifest Reduction
Tokens ~343,000 ~8,800 ~97%
Lines ~21,000 ~450 ~98%

This dramatic decrease in size allows for a much more focused and efficient analysis by both developers and LLM-based tools.


Installation

The tool is packaged and distributed via PyPI.

# Via pip
pip install dbt-mp

# Or run as a one-off executable via uv
uvx dbt-mp --help

Usage

To use the tool, run the dbt-mp command from the root of your dbt project directory. The most common use case is to provide a dbt model selector and an output file path.

Example:

The following command will select the model stg_orders, its parents (+), and its children (+), then generate a filtered manifest.

dbt-mp --select '+stg_orders+' --out-file filtered_manifest.json

The resulting slim_manifest.json will contain a lean, context-rich representation of the selected slice of your dbt project.


Core Attributes for Contextual Quality

dbt-mp optimizes the manifest by preserving a curated set of high-signal attributes that balance context quality with token economy. The following keys are retained:

nodes

Attribute Rationale
schema, name, resource_type Basic identifiers for the node.
unique_id The canonical, unique identifier within the dbt graph.
config (subset) Key configuration like materialized and enabled are crucial for understanding behavior.
tags, columns Metadata and column-level descriptions provide essential semantic context.
raw_code, compiled_code The original and compiled SQL are the most critical assets for code analysis.
refs, sources, depends_on The explicit dependency graph is fundamental for lineage tracing.

sources

Attribute Rationale
database, schema, name Identifiers for the source table.
unique_id The canonical identifier for the source.
description Semantic context for what the source represents.

macros

Attribute Rationale
unique_id The canonical identifier for the macro.
macro_sql The macro's code is essential, as it's injected into model SQL.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_mp-0.2.0.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_mp-0.2.0-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file dbt_mp-0.2.0.tar.gz.

File metadata

  • Download URL: dbt_mp-0.2.0.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbt_mp-0.2.0.tar.gz
Algorithm Hash digest
SHA256 adaf221e3f4db60e95987802c839b198470bad784c2cab38ddef5a00bf9d3286
MD5 8df8699f1c03d4a49db47bf585282fe6
BLAKE2b-256 3ef023e97d434fce936e7dd02d75f3077c768cfa4f74cfee7160b8b83c1b54b7

See more details on using hashes here.

File details

Details for the file dbt_mp-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: dbt_mp-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbt_mp-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6aacb8d98ed0213877bd15583efc54ab38bb409dd45102f17ef36e39858be637
MD5 3724e059a455fb4f4e89cba273fac610
BLAKE2b-256 6829c9131f7964344ba7642c371a53bd4b0c15c1a84c66ce1096ae5d61674f67

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page