Skip to main content

CLI to auto-generate dbt docs using LLMs (Ollama-first).

Project description

dbt-llm-docs

A powerful CLI tool that generates LLM-powered documentation for dbt models and columns and writes them directly into your schema.yml so the results appear in dbt docs serve.

This CLI uses Jinja2 prompt templates plus a pluggable LLM backend (Ollama or OpenAI).
Optionally, it can connect to your actual warehouse (Postgres & Redshift today) and profile real data to give the LLM deeper context for column descriptions.


🚀 Features

✅ 1. LLM-generated model + column documentation

Produces rich, clear Markdown text suitable for dbt docs.
Descriptions are written directly into schema.yml.

✅ 2. Customizable Jinja2 prompt templates

Located in <project>/prompts/.
You can fully customize the writing style, voice, or structure.

✅ 3. dbt-aware selection

Supports:

  • --select
  • --exclude
  • --tags
  • Glob-like patterns (stg_*, marts.*)
  • Parent/child expansion (+model_name)

✅ 4. Data-aware documentation (--use_data Y)

When enabled, the tool:

  1. Reads database connection info from profiles.yml
  2. Connects to the warehouse (Postgres & Redshift supported today)
  3. Executes the model’s compiled SQL
  4. Samples rows and computes:
    • Missing %
    • Unique %
    • Min / Max
    • Mean / Std
    • Example values
  5. Passes these stats to the LLM for smarter, context-rich documentation
  6. Appends a Markdown statistics table under each column description in dbt Docs

🛠️ Support for more databases (Snowflake, BigQuery, Databricks) is coming soon.


🔒 Data Privacy Note

If --use_data Y is enabled, the profile summary (NOT raw data) is sent to the selected LLM backend.

If your organization forbids sending data outside the network, you should use:

dbt-llm-docs llm-docs-generate --backend ollama

Because Ollama runs 100% locally, ensuring no prompts or data ever leave your machine.


🧱 Architecture Overview

flowchart LR

    subgraph DBT["dbt project"]
        DbtModels["dbt models (*.sql)"]
        SchemaYml["schema.yml (descriptions)"]
        DbtProjectYml["dbt_project.yml"]
    end

    subgraph Target["target/ directory"]
        Manifest["manifest.json"]
        Catalog["catalog.json (optional)"]
    end

    subgraph Profiles["profiles.yml"]
        ProfileDev["dev target (Postgres / Redshift)"]
    end

    subgraph CLI["dbt-llm-docs CLI"]
        Typer["Typer CLI (init, list, generate)"]
        Prompts["Jinja templates (model.md.j2, column.md.j2)"]
        Selector["Model selector (--select / --exclude / --tags)"]
        Profiler["Optional data profiler (--use_data Y)"]
        Writer["Writes descriptions to schema.yml"]
    end

    subgraph LLMBackends["LLM Backends"]
        Ollama["Ollama (local)"]
        OpenAI["OpenAI / compatible (cloud)"]
    end

    subgraph Warehouse["Data Warehouse"]
        DB["Postgres / Redshift"]
    end

    DbtModels --> Target
    DbtProjectYml --> Profiles

    Target --> CLI
    Catalog --> CLI
    Profiles --> Profiler
    DB --> Profiler

    Prompts --> Typer
    Typer --> Selector
    Selector --> LLMBackends

    Profiler --> LLMBackends
    LLMBackends --> Writer

    Writer --> SchemaYml
    SchemaYml --> DocsUI["dbt docs UI"]

⚠️ Important: Requires a Compiled dbt Project

dbt-llm-docs depends on dbt’s generated artifacts.
Before running this tool, your dbt project must be compiled and the following files must exist in your target/ directory:

  • manifest.json — required
  • catalog.json — optional but recommended for accurate column types

Generate them using:

dbt docs generate

If these artifacts are missing, the tool cannot discover models, columns, SQL, or metadata needed for documentation.

🤖 Installing Ollama (Recommended for Privacy)

macOS / Linux

curl -fsSL https://ollama.com/install.sh | sh

Run Ollama:

ollama serve

Download a model:

ollama pull llama3.1

Windows (WSL recommended)

Refer to: https://ollama.com/download


📦 Installation Pypi

pip install dbt-tools

📦 Installation from source

git clone <your-repo>
cd dbt-tools

python -m venv .venv
source .venv/bin/activate
pip install -e .

Requires:

  • manifest.json (run dbt docs generate)
  • Optionally catalog.json for column types

⚙️ Environment Variables

To avoid passing arguments repeatedly, you can set environment variables:

# Ollama (local)
export OLLAMA_HOST="http://ubuntu-pc.local:11434"
export OLLAMA_MODEL="llama3.1:8b-instruct-q8_0"
export TEMPERATURE=0.2

# (Future) OpenAI or compatible APIs
export OPENAI_BASE_URL="https://api.openai.com/v1"
export OPENAI_MODEL="gpt-4o-mini"
export OPENAI_API_KEY="sk-..."

🔧 Usage

Initialize templates ( Creates prompts & can be customised)

dbt-tools init --project-dir .

List models

dbt-tools  list --project-dir . --target-dir target

Generate documentation (local LLM)

Default behaviour is to use ollama

dbt-tools llm-docs-generate -project-dir . --target-dir target --select dim_customers 

Generate documentation with real data profiling

dbt-tools llm-docs-generate   --project-dir . --target-dir target --select dim_customers --use-data Y

Generate documentation (open-ai)

dbt-tools llm-docs-generate -project-dir . --target-dir target --select dim_customers --backend openai

Generate documentation with real data profiling

dbt-tools llm-docs-generate   --project-dir . --target-dir target --select dim_customers --use-data Y

🛣️ Roadmap

  • More warehouse support (Snowflake, BigQuery, Databricks)
  • LLM caching
  • Partial regeneration
  • Inline docs (docs/*.md) generation
  • Lineage-aware descriptions

📄 License

MIT (or your preferred license)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_power_tools-0.1.1.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_power_tools-0.1.1-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file dbt_power_tools-0.1.1.tar.gz.

File metadata

  • Download URL: dbt_power_tools-0.1.1.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for dbt_power_tools-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6e4a75c0f700ca59b29c9609f2404ff00ddc342121fe3496ae12d3dde2f1e4f4
MD5 e165a7e1b995eb69721444c794b07108
BLAKE2b-256 a1c76cc6c4bf0c52e956fd7bceb20edcf8998a69a07ec988bb88efcf3496ac4d

See more details on using hashes here.

File details

Details for the file dbt_power_tools-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt_power_tools-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c2bdec2612f5ba7cd38afee94d5e2df81891e1cfa3351b0f0642d7b96c1790a7
MD5 cafe993e7f73bf1bcc4db91f1048e2c3
BLAKE2b-256 a826ce8144170d1840f8445f77fe5264c7347d07c5096320f2ecd8265c5e1d4d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page