Skip to main content

Automated documentation generator for dbt projects using Google Gemini AI

Project description

DBT Autodoc Documentation

dbt-autodoc is the ultimate tool for Automated Documentation and Logging for your dbt projects. It combines the power of Google Gemini AI with a robust Database Logging system to ensure your documentation is always up-to-date, accurate, and auditable.

🌟 Why dbt-autodoc?

  • 🤖 Automatic AI Documentation: Generate comprehensive descriptions for your tables and columns automatically.
  • 💾 Database Logging & History: Every description is stored in a database (duckdb or postgres). This acts as a "Source of Truth" and provides a full history of changes.
  • 🔄 Full Synchronization: Seamlessly integrates with dbt-osmosis to keep your YAML files in sync with your SQL models.
  • 🔒 Protect Manual Work: Respects human-written documentation. If you write it, we lock it.
  • 👥 Team Ready: Use Postgres to share documentation cache across your entire team.

🛠️ Setup

  1. Install:

    pip install dbt-autodoc
    
  2. Configuration: Run dbt-autodoc to generate dbt-autodoc.yml. Important: Edit company_context in this file to give the AI knowledge about your business logic.

  3. Environment Variables:

    GEMINI_API_KEY=your_api_key_here
    POSTGRES_URL=postgresql://user:pass@host:port/db (optional)
    

📋 Recommended Workflow

For the best results, follow this step-by-step workflow to ensure accuracy and control:

  1. Preparation: Update your dbt project and context.

    dbt run
    # Edit dbt-autodoc.yml with company_context
    
  2. Sync Structure (No AI): Regenerate YAML files to match the SQL models. This ensures all new columns are present.

    dbt-autodoc --regenerate-yml
    
  3. Generate Table Descriptions (SQL): Generate AI descriptions for your models (tables/views).

    dbt-autodoc --generate-docs-config-ai --model-path models/staging
    
  4. Manual Review (Important): Open your YAML files. Review the structure and any existing descriptions. If you manually update a description here, it will be protected from AI overwrites in the next step.

  5. Generate Column Descriptions (YAML): Use AI to fill in the missing column descriptions.

    dbt-autodoc --generate-docs-yml-ai --model-path models/staging
    
  6. Propagate & Save: Run inheritance rules on the entire dbt project, then run the tool again to save the final state (including inherited descriptions) to the database.

    dbt-autodoc --regenerate-yml-with-inheritance
    dbt-autodoc --generate-docs-yml-ai --model-path models/staging
    
  7. Next Layer: Repeat steps 2-6 for models/intermediate, models/marts, etc.

🚀 Quick Start (Automated)

If you trust the process and just want to run everything at once:

dbt-autodoc --generate-docs-ai

🧠 How the AI Works

When generating a description for a column or table, the AI considers multiple inputs to produce the most accurate result:

  1. Company Context: The high-level business logic defined in your config.
  2. Model SQL: The actual code of the model being documented.
  3. Existing Descriptions: Any existing documentation or comments in the file.
  4. Upstream Logic: (Implicitly via Osmosis inheritance) Context from upstream models.

It synthesizes all these inputs to write a concise, technical description.

📖 Arguments Reference

Argument Description
--regenerate-yml Structure Only. Regenerate YAML files from dbt models. Does not sync to DB or call AI.
--regenerate-yml-with-inheritance Structure + Inheritance. Regenerate YAML files with inheritance enabled. Use this to propagate descriptions from upstream models.
--model-path Restrict processing to a specific directory (e.g. models/staging).
--generate-docs-config-ai Generate table descriptions in .sql files using AI.
--generate-docs-yml-ai Generate column descriptions in .yml files using AI.
--generate-docs-config Sync .sql files from cache (no AI).
--generate-docs-yml Sync .yml files from cache (no AI).
--generate-docs-ai 🔥 Full Auto. Runs the complete workflow: SQL generation, YAML sync, and YAML generation using AI.
--generate-docs 🔄 Full Sync. Runs the complete workflow using only the database cache (no AI).
--cleanup-db Reset Database. Wipes the description cache and history.
--concurrency Max threads for AI/DB requests (default: 10).

📄 License

MIT License - see LICENSE for details.

🙏 Attribution

Brought to you by JustDataPlease.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_autodoc-1.0.15.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_autodoc-1.0.15-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file dbt_autodoc-1.0.15.tar.gz.

File metadata

  • Download URL: dbt_autodoc-1.0.15.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for dbt_autodoc-1.0.15.tar.gz
Algorithm Hash digest
SHA256 fbbfb877dfa858f134d82c04c1f634fdeb1e822f9ea3fa60fbd78770483100a3
MD5 a817b7ec6cb34f15e89ef1e4088a71e6
BLAKE2b-256 0e4b22fe7d4d70b0a650adf7e35ed47b6306fa2a25eff9df6e12a0557a20f65c

See more details on using hashes here.

File details

Details for the file dbt_autodoc-1.0.15-py3-none-any.whl.

File metadata

  • Download URL: dbt_autodoc-1.0.15-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for dbt_autodoc-1.0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 197368e4d6f1fde6d09b88d69c7c5a6f047db1d8539d88bbb118f5ab1b76ff7c
MD5 37b8f931d33c4950dece2c225dcf595c
BLAKE2b-256 51675da548ef5cbbfc31ba3ca80984d43e783df2a9407e1cab9e909d4e4bef55

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page