Skip to main content

AI-driven quality & governance MCP Server for dbt projects. Audit coverage, profile data, detect schema drift, and auto-generate documentation.

Project description

dbt-doctor architecture

๐Ÿฉบ dbt-doctor

AI-driven quality & governance MCP Server for dbt projects.
Audit coverage, profile data, detect schema drift, and auto-generate documentation โ€” all through natural language with your AI assistant.

CI PyPI version Python 3.10+ MIT License 40 tests passed


๐Ÿค” What is dbt-doctor?

dbt-doctor is an MCP (Model Context Protocol) server that gives your AI coding assistant deep awareness of your dbt project's health. Instead of manually running CLI commands and reading outputs, you can simply ask your AI:

"What's the health of my dbt project?"
"Profile the fct_orders model and suggest tests for it."
"Auto-document the models with lowest test coverage."

dbt-doctor handles all the heavy lifting โ€” reading the manifest, profiling your warehouse, detecting schema drift, and writing back to schema.yml files โ€” all without leaving your AI chat.

๐Ÿ’ก Designed to complement the official dbt-labs/dbt-mcp, not replace it. dbt-labs/dbt-mcp runs dbt commands; dbt-doctor audits, profiles and documents.


โœจ Key Features

๐Ÿ” Project Auditing

Score your project 0โ€“100% across three dimensions โ€” documentation, testing and naming conventions. Get a ranked list of worst-covered models to prioritize.

๐Ÿ“Š Data Profiling

Run efficient single-pass column statistics (NULL rate, cardinality, min/max, uniqueness) using one batched SQL query per table. No slow row-by-row scanning.

๐Ÿ”„ Schema Drift Detection

Compare what's in your warehouse right now against what your manifest.json says should be there. Spot added, removed and type-changed columns instantly.

๐Ÿค– Intelligent Test Suggestions

Translate profile statistics into concrete dbt test recommendations. A column that's 100% unique and non-null? โ†’ suggest not_null + unique. Low cardinality? โ†’ suggest accepted_values with the actual values pre-filled.

โœ๏ธ Non-Destructive YAML Writing

Update schema.yml files using ruamel.yaml to preserve your hand-written comments, existing tests and formatting. Only adds what's missing, never removes what you wrote.

๐Ÿš€ End-to-End Doc Generation

One command to: profile a model โ†’ suggest tests โ†’ preview changes โ†’ write to schema.yml. The full loop from "undocumented" to "well-tested" in a single AI conversation turn.


๐Ÿ› ๏ธ 12 MCP Tools

Category Tool Description
Context list_models All models with doc/test coverage status
Context get_model_details Full model info: SQL, columns, lineage, tests
Audit audit_project Health score (0โ€“100%), naming violations, worst models
Audit check_test_coverage Models ranked by test coverage
Audit analyze_dag Detect orphans, max depth, high fan-out nodes
Audit get_project_health Entry point โ€” single-call dashboard
Profiling profile_model Batch column statistics: NULL%, unique%, min/max
Profiling execute_query Read-only SQL against your warehouse
Profiling detect_schema_drift Compare DB columns vs manifest definitions
Generation suggest_tests Profile stats โ†’ concrete dbt test list
Generation update_model_yaml Write docs & tests to schema.yml (safe merge)
Generation generate_model_docs Killer feature: E2E profileโ†’suggestโ†’write

โšก Quick Start

Install

pip install dbt-doctor

Configure Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "dbt-doctor": {
      "command": "dbt-doctor",
      "args": ["--project-dir", "/absolute/path/to/your/dbt/project"]
    }
  }
}

Configure Cursor

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "dbt-doctor": {
      "command": "dbt-doctor",
      "args": ["--project-dir", "/absolute/path/to/your/dbt/project"]
    }
  }
}

prerequisite: Run dbt compile first to generate target/manifest.json. dbt-doctor reads this file to understand your project structure.


๐Ÿ’ฌ Example AI Conversations

Auto-document your worst model

You:   "Audit my dbt project and document the worst covered model"

AI  โ†’ get_project_health()
    โ† Score: 38%. fct_orders has 0% column coverage and no tests.

AI  โ†’ generate_model_docs("fct_orders")
    โ† Profiled 1.2M rows.
       order_id: 100% unique, 0% null โ†’ suggest not_null + unique
       status: 4 distinct values โ†’ suggest accepted_values: [placed, shipped, completed, returned]

AI  โ†’ update_model_yaml("fct_orders", description="...", columns=[...])
    โ† โœ… Written to models/marts/_fct_orders.yml (added 6 tests, preserved 2 existing)

Detect schema drift before a deploy

You:   "Check if fct_orders has any schema drift"

AI  โ†’ detect_schema_drift("fct_orders")
    โ† โš ๏ธ DRIFT DETECTED:
       + discount_amount (DECIMAL) โ€” in DB but not in manifest
       ~ total_amount: manifest says FLOAT, DB says DECIMAL(12,2)

DAG health check

You:   "Are there any structural issues with my DAG?"

AI  โ†’ analyze_dag()
    โ† โš ๏ธ 3 orphan models (no downstream dependencies)
       โš ๏ธ stg_base has 12 downstream models (high fan-out)
       โœ… Max chain depth: 5 (within recommended limits)

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           Your AI Assistant                 โ”‚
โ”‚      (Claude / Cursor / Copilot Chat)       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚  MCP Protocol
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚          dbt-doctor MCP Server              โ”‚
โ”‚                                             โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Context โ”‚ โ”‚ Audit โ”‚ โ”‚ Prof โ”‚ โ”‚  Gen  โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜   โ”‚
โ”‚       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ”‚                   โ”‚                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚  โ”‚           Core Layer                โ”‚    โ”‚
โ”‚  โ”‚  manifest.py  profiles.py  project  โ”‚    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ–ผ                    โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  dbt Project  โ”‚   โ”‚    Database     โ”‚
  โ”‚ manifest.json โ”‚   โ”‚  PostgreSQL /   โ”‚
  โ”‚ dbt_proj.yml  โ”‚   โ”‚  DuckDB         โ”‚
  โ”‚ schema/*.yml  โ”‚   โ”‚                 โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Project Structure

src/dbt_doctor/
โ”œโ”€โ”€ server.py                 # FastMCP entrypoint โ€” 12 tools registered
โ”‚
โ”œโ”€โ”€ core/                     # Read-only project parsing
โ”‚   โ”œโ”€โ”€ manifest.py           # manifest.json with mtime-based caching
โ”‚   โ”œโ”€โ”€ profiles.py           # profiles.yml with env_var() resolution
โ”‚   โ”œโ”€โ”€ project.py            # dbt_project.yml parser
โ”‚   โ””โ”€โ”€ schema_reader.py      # Finds and reads existing schema.yml
โ”‚
โ”œโ”€โ”€ connectors/               # Database abstraction
โ”‚   โ”œโ”€โ”€ base.py               # Abstract BaseConnector (ABC)
โ”‚   โ”œโ”€โ”€ postgres.py           # psycopg v3, read-only enforcement
โ”‚   โ””โ”€โ”€ duckdb.py             # DuckDB, zero-config, used in tests
โ”‚
โ”œโ”€โ”€ analyzers/                # All read-only analysis logic
โ”‚   โ”œโ”€โ”€ auditor.py            # Coverage scoring, naming violations
โ”‚   โ”œโ”€โ”€ dag_analyzer.py       # Orphan detection, depth, fan-out
โ”‚   โ”œโ”€โ”€ profiler.py           # Batched column statistics per table
โ”‚   โ””โ”€โ”€ drift_detector.py     # DB schema vs manifest comparison
โ”‚
โ”œโ”€โ”€ generators/               # Write operations (with preview)
โ”‚   โ”œโ”€โ”€ test_suggester.py     # Rule-based test recommendations
โ”‚   โ”œโ”€โ”€ yaml_writer.py        # ruamel.yaml non-destructive merge
โ”‚   โ””โ”€โ”€ doc_generator.py      # E2E: profile โ†’ suggest โ†’ write
โ”‚
โ””โ”€โ”€ utils/
    โ””โ”€โ”€ sql_sanitizer.py      # Whitelist-based identifier validation

๐Ÿงช Testing

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=dbt_doctor --cov-report=term-missing

Test results:

tests/test_manifest_parser.py  ......... (9 tests)
tests/test_auditor.py          ......   (6 tests)
tests/test_dag_analyzer.py     ......   (6 tests)
tests/test_profiler.py         .......  (7 tests โ€” DuckDB in-memory)
tests/test_test_suggester.py   ........  (8 tests)
tests/test_yaml_writer.py      ....     (4 tests)

====== 40 passed in 0.42s ======

๐Ÿ”’ Security Design

  • Read-only database access โ€” all execute_query calls are wrapped in a read-only transaction. Write operations are blocked at the connector level.
  • SQL injection prevention โ€” all table and column identifiers are validated against a strict whitelist regex before being interpolated into queries.
  • No credentials in memory โ€” profiles.yml credentials are fetched fresh each connection and not cached.
  • Preview before write โ€” generate_model_docs always shows a diff preview before applying changes to schema.yml. You can choose not to apply.

๐Ÿ—บ๏ธ Roadmap

  • BigQuery connector
  • Snowflake connector
  • Redshift connector
  • detect_schema_drift batch mode (all models at once)
  • Model-level accepted_values inference from warehouse data
  • Slack/PagerDuty alerts on drift detection
  • GitHub Actions integration for drift-as-CI

๐Ÿ”— Related Projects

Project Type Difference
dbt-labs/dbt-mcp MCP Official dbt MCP โ€” runs dbt commands, executes queries
dbt-coverage CLI Coverage only, no AI integration, no YAML writing
dbt-project-evaluator dbt package Requires install in each project, no profiling

dbt-doctor is the only tool combining audit + profiling + drift detection + AI-driven YAML generation in a single MCP server.


๐Ÿ“„ License

MIT โ€” see LICENSE.


Made with โค๏ธ for the dbt community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_doctor-0.1.0.tar.gz (40.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_doctor-0.1.0-py3-none-any.whl (39.5 kB view details)

Uploaded Python 3

File details

Details for the file dbt_doctor-0.1.0.tar.gz.

File metadata

  • Download URL: dbt_doctor-0.1.0.tar.gz
  • Upload date:
  • Size: 40.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for dbt_doctor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ae662c33d6635e30c5fd00a444d63b840f457cd1cb67558554d2db09763e8dd1
MD5 684ff77c557fc3da78ed344986002c00
BLAKE2b-256 9fc6f5fadac6cf5a60dcb1613d1271afcd30e027eb0eac3727b5526b6cf5344d

See more details on using hashes here.

File details

Details for the file dbt_doctor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dbt_doctor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for dbt_doctor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 86d4d7c010439bfb64f2e665d0cda3c6277e318888967bdf68067c8b48eeef37
MD5 da3393acad315e66045bcf2d61b74b42
BLAKE2b-256 69ba76ebe5d8145abbe300004bf9252f637fc3a2b83a97343cbc546f7f7e5e40

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page