AI-driven quality & governance MCP Server for dbt projects. Audit coverage, profile data, detect schema drift, and auto-generate documentation.
Project description
๐ฉบ dbt-doctor
AI-driven quality & governance MCP Server for dbt projects.
Audit coverage, profile data, detect schema drift, and auto-generate documentation โ all through natural language with your AI assistant.
๐ค What is dbt-doctor?
dbt-doctor is an MCP (Model Context Protocol) server that gives your AI coding assistant deep awareness of your dbt project's health. Instead of manually running CLI commands and reading outputs, you can simply ask your AI:
"What's the health of my dbt project?"
"Profile the fct_orders model and suggest tests for it."
"Auto-document the models with lowest test coverage."
dbt-doctor handles all the heavy lifting โ reading the manifest, profiling your warehouse, detecting schema drift, and writing back to schema.yml files โ all without leaving your AI chat.
๐ก Designed to complement the official dbt-labs/dbt-mcp, not replace it. dbt-labs/dbt-mcp runs dbt commands; dbt-doctor audits, profiles and documents.
โจ Key Features
๐ Project Auditing
Score your project 0โ100% across three dimensions โ documentation, testing and naming conventions. Get a ranked list of worst-covered models to prioritize.
๐ Data Profiling
Run efficient single-pass column statistics (NULL rate, cardinality, min/max, uniqueness) using one batched SQL query per table. No slow row-by-row scanning.
๐ Schema Drift Detection
Compare what's in your warehouse right now against what your manifest.json says should be there. Spot added, removed and type-changed columns instantly.
๐ค Intelligent Test Suggestions
Translate profile statistics into concrete dbt test recommendations. A column that's 100% unique and non-null? โ suggest not_null + unique. Low cardinality? โ suggest accepted_values with the actual values pre-filled.
โ๏ธ Non-Destructive YAML Writing
Update schema.yml files using ruamel.yaml to preserve your hand-written comments, existing tests and formatting. Only adds what's missing, never removes what you wrote.
๐ End-to-End Doc Generation
One command to: profile a model โ suggest tests โ preview changes โ write to schema.yml. The full loop from "undocumented" to "well-tested" in a single AI conversation turn.
๐ ๏ธ 12 MCP Tools
| Category | Tool | Description |
|---|---|---|
| Context | list_models |
All models with doc/test coverage status |
| Context | get_model_details |
Full model info: SQL, columns, lineage, tests |
| Audit | audit_project |
Health score (0โ100%), naming violations, worst models |
| Audit | check_test_coverage |
Models ranked by test coverage |
| Audit | analyze_dag |
Detect orphans, max depth, high fan-out nodes |
| Audit | get_project_health |
Entry point โ single-call dashboard |
| Profiling | profile_model |
Batch column statistics: NULL%, unique%, min/max |
| Profiling | execute_query |
Read-only SQL against your warehouse |
| Profiling | detect_schema_drift |
Compare DB columns vs manifest definitions |
| Generation | suggest_tests |
Profile stats โ concrete dbt test list |
| Generation | update_model_yaml |
Write docs & tests to schema.yml (safe merge) |
| Generation | generate_model_docs |
Killer feature: E2E profileโsuggestโwrite |
โก Quick Start
Install
pip install dbt-doctor
Configure Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"dbt-doctor": {
"command": "dbt-doctor",
"args": ["--project-dir", "/absolute/path/to/your/dbt/project"]
}
}
}
Configure Cursor
Add to .cursor/mcp.json:
{
"mcpServers": {
"dbt-doctor": {
"command": "dbt-doctor",
"args": ["--project-dir", "/absolute/path/to/your/dbt/project"]
}
}
}
prerequisite: Run
dbt compilefirst to generatetarget/manifest.json. dbt-doctor reads this file to understand your project structure.
๐ฌ Example AI Conversations
Auto-document your worst model
You: "Audit my dbt project and document the worst covered model"
AI โ get_project_health()
โ Score: 38%. fct_orders has 0% column coverage and no tests.
AI โ generate_model_docs("fct_orders")
โ Profiled 1.2M rows.
order_id: 100% unique, 0% null โ suggest not_null + unique
status: 4 distinct values โ suggest accepted_values: [placed, shipped, completed, returned]
AI โ update_model_yaml("fct_orders", description="...", columns=[...])
โ โ
Written to models/marts/_fct_orders.yml (added 6 tests, preserved 2 existing)
Detect schema drift before a deploy
You: "Check if fct_orders has any schema drift"
AI โ detect_schema_drift("fct_orders")
โ โ ๏ธ DRIFT DETECTED:
+ discount_amount (DECIMAL) โ in DB but not in manifest
~ total_amount: manifest says FLOAT, DB says DECIMAL(12,2)
DAG health check
You: "Are there any structural issues with my DAG?"
AI โ analyze_dag()
โ โ ๏ธ 3 orphan models (no downstream dependencies)
โ ๏ธ stg_base has 12 downstream models (high fan-out)
โ
Max chain depth: 5 (within recommended limits)
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Your AI Assistant โ
โ (Claude / Cursor / Copilot Chat) โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP Protocol
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ dbt-doctor MCP Server โ
โ โ
โ โโโโโโโโโโโ โโโโโโโโโ โโโโโโโโ โโโโโโโโโ โ
โ โ Context โ โ Audit โ โ Prof โ โ Gen โ โ
โ โโโโโโฌโโโโโ โโโโโฌโโโโ โโโโฌโโโโ โโโโโฌโโโโ โ
โ โโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Core Layer โ โ
โ โ manifest.py profiles.py project โ โ
โ โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโดโโโโโโโโโโโ
โผ โผ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ dbt Project โ โ Database โ
โ manifest.json โ โ PostgreSQL / โ
โ dbt_proj.yml โ โ DuckDB โ
โ schema/*.yml โ โ โ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
Project Structure
src/dbt_doctor/
โโโ server.py # FastMCP entrypoint โ 12 tools registered
โ
โโโ core/ # Read-only project parsing
โ โโโ manifest.py # manifest.json with mtime-based caching
โ โโโ profiles.py # profiles.yml with env_var() resolution
โ โโโ project.py # dbt_project.yml parser
โ โโโ schema_reader.py # Finds and reads existing schema.yml
โ
โโโ connectors/ # Database abstraction
โ โโโ base.py # Abstract BaseConnector (ABC)
โ โโโ postgres.py # psycopg v3, read-only enforcement
โ โโโ duckdb.py # DuckDB, zero-config, used in tests
โ
โโโ analyzers/ # All read-only analysis logic
โ โโโ auditor.py # Coverage scoring, naming violations
โ โโโ dag_analyzer.py # Orphan detection, depth, fan-out
โ โโโ profiler.py # Batched column statistics per table
โ โโโ drift_detector.py # DB schema vs manifest comparison
โ
โโโ generators/ # Write operations (with preview)
โ โโโ test_suggester.py # Rule-based test recommendations
โ โโโ yaml_writer.py # ruamel.yaml non-destructive merge
โ โโโ doc_generator.py # E2E: profile โ suggest โ write
โ
โโโ utils/
โโโ sql_sanitizer.py # Whitelist-based identifier validation
๐งช Testing
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=dbt_doctor --cov-report=term-missing
Test results:
tests/test_manifest_parser.py ......... (9 tests)
tests/test_auditor.py ...... (6 tests)
tests/test_dag_analyzer.py ...... (6 tests)
tests/test_profiler.py ....... (7 tests โ DuckDB in-memory)
tests/test_test_suggester.py ........ (8 tests)
tests/test_yaml_writer.py .... (4 tests)
====== 40 passed in 0.42s ======
๐ Security Design
- Read-only database access โ all
execute_querycalls are wrapped in a read-only transaction. Write operations are blocked at the connector level. - SQL injection prevention โ all table and column identifiers are validated against a strict whitelist regex before being interpolated into queries.
- No credentials in memory โ profiles.yml credentials are fetched fresh each connection and not cached.
- Preview before write โ
generate_model_docsalways shows a diff preview before applying changes to schema.yml. You can choose not to apply.
๐บ๏ธ Roadmap
- BigQuery connector
- Snowflake connector
- Redshift connector
-
detect_schema_driftbatch mode (all models at once) - Model-level
accepted_valuesinference from warehouse data - Slack/PagerDuty alerts on drift detection
- GitHub Actions integration for drift-as-CI
๐ Related Projects
| Project | Type | Difference |
|---|---|---|
| dbt-labs/dbt-mcp | MCP | Official dbt MCP โ runs dbt commands, executes queries |
| dbt-coverage | CLI | Coverage only, no AI integration, no YAML writing |
| dbt-project-evaluator | dbt package | Requires install in each project, no profiling |
dbt-doctor is the only tool combining audit + profiling + drift detection + AI-driven YAML generation in a single MCP server.
๐ License
MIT โ see LICENSE.
Made with โค๏ธ for the dbt community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_doctor-0.1.0.tar.gz.
File metadata
- Download URL: dbt_doctor-0.1.0.tar.gz
- Upload date:
- Size: 40.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae662c33d6635e30c5fd00a444d63b840f457cd1cb67558554d2db09763e8dd1
|
|
| MD5 |
684ff77c557fc3da78ed344986002c00
|
|
| BLAKE2b-256 |
9fc6f5fadac6cf5a60dcb1613d1271afcd30e027eb0eac3727b5526b6cf5344d
|
File details
Details for the file dbt_doctor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dbt_doctor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 39.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86d4d7c010439bfb64f2e665d0cda3c6277e318888967bdf68067c8b48eeef37
|
|
| MD5 |
da3393acad315e66045bcf2d61b74b42
|
|
| BLAKE2b-256 |
69ba76ebe5d8145abbe300004bf9252f637fc3a2b83a97343cbc546f7f7e5e40
|