Skip to main content

CLI to auto-generate dbt docs using LLMs (Ollama-first).

Project description

dbt-llm-docs

A powerful CLI tool that generates LLM-powered documentation for dbt models and columns and writes them directly into your schema.yml so the results appear in dbt docs serve.

This CLI uses Jinja2 prompt templates plus a pluggable LLM backend (Ollama or OpenAI).
Optionally, it can connect to your actual warehouse (Postgres & Redshift today) and profile real data to give the LLM deeper context for column descriptions.


🚀 Features

✅ 1. LLM-generated model + column documentation

Produces rich, clear Markdown text suitable for dbt docs.
Descriptions are written directly into schema.yml.

✅ 2. Customizable Jinja2 prompt templates

Located in <project>/prompts/.
You can fully customize the writing style, voice, or structure.

✅ 3. dbt-aware selection

Supports:

  • --select
  • --exclude
  • --tags
  • Glob-like patterns (stg_*, marts.*)
  • Parent/child expansion (+model_name)

✅ 4. Data-aware documentation (--use_data Y)

When enabled, the tool:

  1. Reads database connection info from profiles.yml
  2. Connects to the warehouse (Postgres & Redshift supported today)
  3. Executes the model’s compiled SQL
  4. Samples rows and computes:
    • Missing %
    • Unique %
    • Min / Max
    • Mean / Std
    • Example values
  5. Passes these stats to the LLM for smarter, context-rich documentation
  6. Appends a Markdown statistics table under each column description in dbt Docs

🛠️ Support for more databases (Snowflake, BigQuery, Databricks) is coming soon.


🔒 Data Privacy Note

If --use_data Y is enabled, the profile summary (NOT raw data) is sent to the selected LLM backend.

If your organization forbids sending data outside the network, you should use:

dbt-llm-docs llm-docs-generate --backend ollama

Because Ollama runs 100% locally, ensuring no prompts or data ever leave your machine.


🧱 Architecture Overview

flowchart LR

    subgraph DBT["dbt project"]
        DbtModels["dbt models (*.sql)"]
        SchemaYml["schema.yml (descriptions)"]
        DbtProjectYml["dbt_project.yml"]
    end

    subgraph Target["target/ directory"]
        Manifest["manifest.json"]
        Catalog["catalog.json (optional)"]
    end

    subgraph Profiles["profiles.yml"]
        ProfileDev["dev target (Postgres / Redshift)"]
    end

    subgraph CLI["dbt-llm-docs CLI"]
        Typer["Typer CLI (init, list, generate)"]
        Prompts["Jinja templates (model.md.j2, column.md.j2)"]
        Selector["Model selector (--select / --exclude / --tags)"]
        Profiler["Optional data profiler (--use_data Y)"]
        Writer["Writes descriptions to schema.yml"]
    end

    subgraph LLMBackends["LLM Backends"]
        Ollama["Ollama (local)"]
        OpenAI["OpenAI / compatible (cloud)"]
    end

    subgraph Warehouse["Data Warehouse"]
        DB["Postgres / Redshift"]
    end

    DbtModels --> Target
    DbtProjectYml --> Profiles

    Target --> CLI
    Catalog --> CLI
    Profiles --> Profiler
    DB --> Profiler

    Prompts --> Typer
    Typer --> Selector
    Selector --> LLMBackends

    Profiler --> LLMBackends
    LLMBackends --> Writer

    Writer --> SchemaYml
    SchemaYml --> DocsUI["dbt docs UI"]

⚠️ Important: Requires a Compiled dbt Project

dbt-llm-docs depends on dbt’s generated artifacts.
Before running this tool, your dbt project must be compiled and the following files must exist in your target/ directory:

  • manifest.json — required
  • catalog.json — optional but recommended for accurate column types

Generate them using:

dbt docs generate

If these artifacts are missing, the tool cannot discover models, columns, SQL, or metadata needed for documentation.

🤖 Installing Ollama (Recommended for Privacy)

macOS / Linux

curl -fsSL https://ollama.com/install.sh | sh

Run Ollama:

ollama serve

Download a model:

ollama pull llama3.1

Windows (WSL recommended)

Refer to: https://ollama.com/download


📦 Installation Pypi

pip install dbt-tools

📦 Installation from source

git clone <your-repo>
cd dbt-tools

python -m venv .venv
source .venv/bin/activate
pip install -e .

Requires:

  • manifest.json (run dbt docs generate)
  • Optionally catalog.json for column types

⚙️ Environment Variables

To avoid passing arguments repeatedly, you can set environment variables:

# Ollama (local)
export OLLAMA_HOST="http://ubuntu-pc.local:11434"
export OLLAMA_MODEL="llama3.1:8b-instruct-q8_0"
export TEMPERATURE=0.2

# (Future) OpenAI or compatible APIs
export OPENAI_BASE_URL="https://api.openai.com/v1"
export OPENAI_MODEL="gpt-4o-mini"
export OPENAI_API_KEY="sk-..."

🔧 Usage

Initialize templates ( Creates prompts & can be customised)

dbt-tools init --project-dir .

List models

dbt-tools  list --project-dir . --target-dir target

Generate documentation (local LLM)

Default behaviour is to use ollama

dbt-tools llm-docs-generate -project-dir . --target-dir target --select dim_customers 

Generate documentation with real data profiling

dbt-tools llm-docs-generate   --project-dir . --target-dir target --select dim_customers --use-data Y

Generate documentation (open-ai)

dbt-tools llm-docs-generate -project-dir . --target-dir target --select dim_customers --backend openai

Generate documentation with real data profiling

dbt-tools llm-docs-generate   --project-dir . --target-dir target --select dim_customers --use-data Y

🛣️ Roadmap

  • More warehouse support (Snowflake, BigQuery, Databricks)
  • LLM caching
  • Partial regeneration
  • Inline docs (docs/*.md) generation
  • Lineage-aware descriptions

📄 License

MIT (or your preferred license)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_power_tools-0.1.0.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_power_tools-0.1.0-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file dbt_power_tools-0.1.0.tar.gz.

File metadata

  • Download URL: dbt_power_tools-0.1.0.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for dbt_power_tools-0.1.0.tar.gz
Algorithm Hash digest
SHA256 342b205f153e8ed836c7918698ece0e52daf447b2378c425347409cfb879b387
MD5 7931e93a4ac8d7b8fa33eb3242ffcc80
BLAKE2b-256 4fec285f6cf134e9884e279b9058317c2e9559c7c0ff63a1bd6287169ff4753b

See more details on using hashes here.

File details

Details for the file dbt_power_tools-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt_power_tools-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6a3373590bcbf775e31ba618a6ad481f8047126d67f6fcd87047344c8ad1b8dc
MD5 142a44addd50fa200f8d6dab10a519e3
BLAKE2b-256 b85050f0d0b3fe12634fcb9afebdb2b5d7bc33c5478f012aa23f1821b43565c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page