docdbt (documentation build tool) is a Streamlit app for managing dbt project documentation.
Project description
docbt
Documentation Build Tool
Generate YAML documentation for dbt models with optional AI assistance. Built with Streamlit for an intuitive and familiar web interface.
๐ Why docbt
docbt (Doc Build Tool) is utility designed to streamline dbt (Data Build Tool) documentation workflows. Connect your data and generate professional YAML documentation ready for your DBT projects. Do this using the assistance provided by the UI and even chat with AI models to 100x your productivity!
๐ Target Audience
- Analytics Engineers: streamline your dbt workflow and maintain consistent data modelling.
- Data Engineers: ensure data quality across your infrastructure through thorough testing.
- Data Managers: automate tedious tasks and help your team focus on delivering value.
- AI Enthusiasts: Experiment with local LLMs or cloud providers for automation tasks.
โจ Key Features
- ๐ ๏ธ Non-AI Support: Generate documentation without requiring AI models.
- ๐ค Multiple LLM Providers: Choose from OpenAI's GPT models, local Ollama, or LM Studio.
- ๐ฌ Interactive Chat: Ask questions about your data and get specific recommendations.
- ๐ง Developer Mode: Token metrics, response times, parameters, prompts and debugging information.
- โ๏ธ Advanced Configuration: Fine-tune generation parameters.
- ๐ง Chain of Thought: View AI reasoning process (when available).
- ๐ Real-time Metrics: Monitor API usage, token consumption, and performance.
- ๐ Multiple Data Sources: Connect to Snowflake, BigQuery, and more for seamless data integration.
โณ More to come
- More Tests Coverage: automation of dbt utils, dbt expectations and dbt-data-reliability packages.
- Sources: use docbt to automate source declaration and documentation.
- Extra LM providers: use Gemini, Grok, Claude and others to streamline your work.
- Extra Data Sources: connect to Databricks, PostgreSQL, Redshift and others.
- One-click analytics: gain critical insights into your data to better assign tests.
๏ฟฝ Contents
- ๐ Why docbt
- ๐ Quick Start
- ๐ฏ Usage
- ๐ง Configuration Overview
- ๐ Troubleshooting
- ๐ License
- ๐ Acknowledgments
- ๐ฌ Support
- ๐ค Contributing
- ๐ฐ Sponsoring
๏ฟฝ๐ Quick Start
Prerequisites
- Python 3.10 or higher
- uv (recommended). poetry or good old pip for package management
- Optional: Ollama, LM Studio, or OpenAI API key for AI assistance
- Optional: Docker, Docker Compose for containerized deployment
๐ฆ Installation
We recommend always isolating your code within a virtual environment and installing the package in it to avoid dependency issues.
Using uv
# Create a virtual enfironment
uv venv
# Activate your virtual environment
source .venv/bin/activate
# Install package version of your choice
uv add docbt # For base package with no data platform
uv add "docbt[snowflake]" # For adding Snowflake provider
uv add "docbt[bigquery]" # For adding BigQuery provider
uv add "docbt[all-providers]" # For adding all available data providers
uv add "docbt[dev]" # For development
# (alternatively) use uv pip
uv pip install docbt
# Verify installation
docbt --version
# Run the application
docbt run
Using Poetry
# Initialize or navigate to your project
# If you don't have a pyproject.toml yet
poetry init
# Add docbt to your project
poetry add docbt # For base package with no data platform
poetry add "docbt[snowflake]" # For adding Snowflake provider
poetry add "docbt[bigquery]" # For adding BigQuery provider
poetry add "docbt[all-providers]" # For adding all available data providers
# Development dependencies (optional)
poetry add --group dev "docbt[dev]"
# Activate the Poetry shell
poetry shell
# Verify installation
docbt --version
# Run the application
docbt run
Using pip
# Create virtual environments
python -m venv env
# Activate it
source env/bin/activate
# Install package version of your choice
pip install docbt # For base package with no data platform
pip install "docbt[snowflake]" # For adding Snowflake provider
pip install "docbt[bigquery]" # For adding BigQuery provider
pip install "docbt[all-providers]" # For adding all available data providers
pip install "docbt[dev]" # For development
# Verify installation
docbt --version
# Run the application
docbt run
๐ง Building from Source
Building from source gives you access to the latest development features and allows you to contribute to the project. We recommend using uv for faster dependency resolution and installation. This is also what we, the developers, use.
Using uv (Recommended)
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt
# Create and activate a virtual environment
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in editable mode with all dependencies
uv pip install -e . # Base installation
uv pip install -e ".[snowflake]" # With Snowflake support
uv pip install -e ".[bigquery]" # With BigQuery support
uv pip install -e ".[all-providers]" # With all data providers
uv pip install -e ".[dev]" # With development tools
# Verify installation
docbt --version
# Run the application
docbt run
Using pip
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Upgrade pip
pip install --upgrade pip
# Install in editable mode
pip install -e . # Base installation
pip install -e ".[snowflake]" # With Snowflake support
pip install -e ".[bigquery]" # With BigQuery support
pip install -e ".[all-providers]" # With all data providers
pip install -e ".[dev]" # With development tools
# Verify installation
docbt --version
# Run the application
docbt run
Using Poetry
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt
# Install dependencies
poetry install
# Install with extras
poetry install --extras "snowflake bigquery"
# Activate the virtual environment
poetry shell
# Run the application
docbt run
Using Pipenv
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt
# Install dependencies
pipenv install --dev
# Activate the virtual environment
pipenv shell
# Install in editable mode
pip install -e .
# Run the application
docbt run
Development Setup
For contributors and developers:
# Clone and navigate to the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt
# Install with development dependencies (using uv)
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"
# Install pre-commit hooks (optional but recommended)
pre-commit install
# Run tests
make test
# Run linting and formatting
make lint
make format
# Check code quality
ruff check .
ruff format .
# Run specific test files
pytest tests/server/test_server.py -v
Verifying Your Installation
After building from source, verify everything works:
# Check version
docbt --version
# View help
docbt help
# Run the server
docbt run
# Run with custom settings
docbt run --port 8080 --log-level DEBUG
Using Make (Recommended for Contributors)
If you're contributing to the project, using Make provides the easiest setup experience with automated tasks.
Prerequisites:
- Make (usually pre-installed on Linux/macOS)
- Git
# Clone the repository
git clone https://github.com/aleenprd/docbt.git
cd docbt
# Create virtual environment (Make will use uv automatically)
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install all dependencies with one command
make install
# Create .env file from template (keeps section headers, removes comments)
make env
# Edit .env with your credentials
nano .env # or your preferred editor
# Install pre-commit hooks (optional but recommended)
make pre-commit
# Verify installation by running tests
make test
# Run the application
docbt run
Common Make commands for development:
make help # Show all available commands
make install # Install dependencies
make env # Create .env from .env.example
make test # Run tests
make test-cov # Run tests with coverage report
make lint # Check code quality
make format # Auto-format code
make check # Run format check + lint
make ci # Run all CI checks (format, lint, test)
make pre-commit # Install pre-commit hooks
For detailed information on all Make commands, see Make Commands Guide.
Troubleshooting Build Issues
Missing Build Tools:
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install python3-dev build-essential
# macOS (requires Homebrew)
brew install python@3.10
# Windows (requires Visual Studio Build Tools)
# Download from: https://visualstudio.microsoft.com/downloads/
Dependency Conflicts:
# Clear pip cache
pip cache purge
# Or with uv
uv cache clean
# Reinstall from scratch
rm -rf .venv
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"
Permission Issues:
# Don't use sudo with pip/uv in virtual environments
# If you get permission errors, ensure you're in an activated venv
source .venv/bin/activate
๐ฏ Usage
docbt comes equipped with a command line tool which supports the commands:
- --version: prints the version of the package.
- help: will print very detailed information about commands and options you can use to run the app.
- run: run the Streamlit app with the option to specify host, port, log level.
Data Tab
Provide the app with data to start working with it
- Upload: CSV, JSON from your local storage
- Data Warehouse: connect to your data platform like Snowflake or BigQuery
- Context Integration: Data automatically included in AI conversations
- Statistics and EDA: (coming soon)
Node Tab
Here you can set up the configuration for your node
- Provide specific config: customize your config with platform-specific properties
- Configure node properties: from materialization to meta-tags
- Apply node-level data tests: (coming soon)
Columns Tab
Here you can set up the configuration, documentation and tests for your columns
Sidebar and Config Tab
See the end result of your work in real time
- Preview Configuration: Interactive visual representation of generated YAML
- Real-time Updates: see changes live as you configure your documentation using the UI
- AI Suggestions: use LLMs to generate node and column level descriptions, suggest constraints and data tets
AI Tab
Configure your AI provider and settings
- Choose Provider: OpenAI, Ollama, or LM Studio
- Developer Mode: Enable advanced settings and metrics
- System Prompt: Customize AI context and behavior (developer mode)
- Generation Parameters: Control temperature, max tokens, top-p, stop sequences, etc.
Chat Tab
Interact with your AI assistant with in-context data sample
- Ask questions about DBT best practices or your data in general
- Get recommendations for data modeling and data use cases
- Just have whatever type of conversation you want with your model
- Enable "Chain of Thought" to see AI reasoning
๐ง Configuration Overview
The behavior of the app can be configured through usage of environment variables. You can find an example environment in the repo. Usage of make env (for developers) will also spawn your own .env file to work with. Alternatively, copy the .env.example contents into .env to make use of docbt's python-dotenv feature. Or just export the environment variables/inject them into your environment of choice.
Logging Configuration
Control the verbosity of docbt's logging output to help with debugging or reduce noise in production.
Setting Log Level:
You can configure the logging level in two ways:
- CLI Flag (highest priority):
docbt run --log-level DEBUG
- Environment Variable (used if no CLI flag provided):
# In .env file
DOCBT_LOG_LEVEL=DEBUG
# Or export directly
export DOCBT_LOG_LEVEL=DEBUG
Available Log Levels:
TRACE- Most verbose, includes all internal detailsDEBUG- Detailed debugging information (useful for troubleshooting)INFO- General informational messages (default)SUCCESS- Success messages onlyWARNING- Warning messages and aboveERROR- Error messages and aboveCRITICAL- Only critical errors
Examples:
# Use DEBUG level for troubleshooting
docbt run --log-level DEBUG
# Use environment variable for persistent configuration
echo "DOCBT_LOG_LEVEL=DEBUG" >> .env
docbt run
# Reduce logging noise in production
docbt run --log-level WARNING
Note: The CLI flag always takes precedence over the environment variable. If neither is specified, the default level is INFO.
LLM Providers
# Enable/disable AI usage
DOCBT_USE_AI_DEFAULT=false
# Enable/disable developer more for advanced features
DOCBT_DEVELOPER_MODE_ENABLED=true
DOCBT_SHOW_CHAIN_OF_THOUGHT=true
# You can choose which provider will appear as your default
DOCBT_LLM_PROVIDER_DEFAULT=openai/ollama/lmstudio
OpenAI
We recommend working with gpt-5 series but you can use the Fetch Models button to use whatever OpenAI has to offer.
- gpt-5-nano: good for most tasks and very cheap - fails to produce valid structured output with large sample size or too many cols
- gpt-5-mini: handles itself better than nano, worse at long context than gpt-5. Good middle-ground.
- gpt-5: the best of the gpt-5 series but the most expensive. Use sparingly.
# Set your API key
export DOCBT_OPENAI_API_KEY="sk-..."
# Or add to .env file
DOCBT_OPENAI_API_KEY=sk-...
# Enable it in the UI
DOCBT_DISPLAY_LLM_PROVIDER_OPENAI=true
Ollama (OSS)
We recomment using models such as:
- Qwen3 series especially in the 4B to 14B range
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a model
ollama pull qwen3:4b
# Start server (default: http://localhost:11434)
ollama serve
# Set host and port environment variables
DOCBT_OLLAMA_HOST=localhost
DOCBT_OLLAMA_PORT=11434
# Enable it in the UI
DOCBT_DISPLAY_LLM_PROVIDER_OLLAMA=true
LM Studio (OSS)
Some models we would recommend are:
- Qwen3-4b-instruct-2507 or the 8B/14B variant
- Qwen3-4b-thinking-2507 or the 8B/14B variant
- Qwen3-30B-A3B if your GPU permits
Note: some models are incapable of producing valid structured outputs. For example, oddly enough, gpt-oss cannot. Experiment and find out what works for your usecase and hardware. Increasing context window in LM-Studio can troubleshoot bugs, especially with data that has lots of columns.
- Download from lmstudio.ai
- Browse models and download the ones you want
- Enable "Local Server" (default: http://localhost:1234) from UI
# Set host and port environment variables
DOCBT_LMSTUDIO_HOST=localhost
DOCBT_LMSTUDIO_PORT=1234
# Enable it in the UI
DOCBT_DISPLAY_LLM_PROVIDER_LMSTUDIO=true
Advanced Parameters
In Developer Mode, fine-tune AI generation with inference parameters
- API Timeout: amount of seconds until API call fails
- Max Tokens: Maximum response length (100-4000)
- Temperature: Creativity level (0.0-2.0)
0.0: Deterministic, focused1.0: Balanced2.0: More creative, random
- Top P: Nucleus sampling (0.0-1.0)
- Stop Sequences: Custom stop words/phrases
Note: gpt-5 series does not support temperature (always 1), top-p and stop sequences.
๐๏ธ Data Providers
You can use different connection methods to connect to the following data
Snowflake
Connect to Snowflake by means of with password, SSO, MFA or with RSA key.
# Example: connect with your user and password
DOCBT_SNOWFLAKE_ACCOUNT=your-account-id
DOCBT_SNOWFLAKE_USER=your-username
DOCBT_SNOWFLAKE_PASSWORD=your-password
DOCBT_SNOWFLAKE_WAREHOUSE=your-warehouse
DOCBT_SNOWFLAKE_DATABASE=your-database
DOCBT_SNOWFLAKE_SCHEMA=PUBLIC
DOCBT_SNOWFLAKE_AUTHENTICATOR=snowflake
BigQuery
Currently, the BigQuery connection only works with credentials JSON method:
- Install cloud dk
- Authenticate with JSON credentials
# Point to your credentials JSON in the environment variables
DOCBT_GOOGLE_APPLICATION_CREDENTIALS=/home/<user>/.config/gcloud/application_default_credentials.json
๐ Troubleshooting
Common Issues
Streamlit App/General Issues Run docbt with debug log level and inspect the logs. If you find any bugs while doing so, please report them. :)
docbt run --log-level debug
LLM Connection Errors
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Verify LM Studio server
curl http://localhost:1234/v1/models
# Test OpenAI API key
curl -H "Authorization: Bearer $OPENAI_API_KEY" https://api.openai.com/v1/models
Docker Issues
# View container logs
docker-compose logs docbt
# Check if container is running
docker ps
# Restart container
docker-compose restart docbt
See Docker Guide for more Docker-specific troubleshooting.
๐ License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
๐ Acknowledgments
- Inspired by the DBT community
- Built with Streamlit
- AI via OpenAI, Ollama, and LM Studio
- Data via Snowflake, BigQuery
๐ฌ Support
- ๐ Issues: GitHub Issues
- ๐ฌ Discussions: GitHub Discussions
- ๐ง Email: predaalin2694@gmail.com
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details.
Quick Start:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes and add tests
- Run
ruff format .andpytest - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
CI/CD: All pull requests are automatically tested with our CI pipeline. See CI/CD Documentation for details.
Development Tools: We use Make for automation. See Make Commands Guide for all available commands.
๐ฐ Sponsoring
If you like what I'm working on and decide to sponsor you can do so via:
Happy documenting! ๐ Generate better DBT documentation with AI assistance.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docbt-0.1.7.tar.gz.
File metadata
- Download URL: docbt-0.1.7.tar.gz
- Upload date:
- Size: 60.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22335ad0c1fdcc32ff87b49f00119cdc74dbdfe6bf213a54a0c385208ad6055b
|
|
| MD5 |
08b90d96458a5943c4bddfefbc7ca462
|
|
| BLAKE2b-256 |
d0024cbffa17032fc9b4f6e6e6ec105d243a3f56729d9a1775b16b103c868cd8
|
Provenance
The following attestation bundles were made for docbt-0.1.7.tar.gz:
Publisher:
release.yml on aleenprd/docbt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
docbt-0.1.7.tar.gz -
Subject digest:
22335ad0c1fdcc32ff87b49f00119cdc74dbdfe6bf213a54a0c385208ad6055b - Sigstore transparency entry: 692518017
- Sigstore integration time:
-
Permalink:
aleenprd/docbt@0d30ae6ecf15437dd3b35755438106f1b39fb7c5 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/aleenprd
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0d30ae6ecf15437dd3b35755438106f1b39fb7c5 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file docbt-0.1.7-py3-none-any.whl.
File metadata
- Download URL: docbt-0.1.7-py3-none-any.whl
- Upload date:
- Size: 61.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c089cffa3d2152458559178a6555ebe61e6c25741b25ee5d99fe8952f87b4b61
|
|
| MD5 |
496952091bcdfad44abae7a435ad05e7
|
|
| BLAKE2b-256 |
67520efdc0baab05aee00bfa301e77981b6b8d5290bec3aac31c348b4ab09af2
|
Provenance
The following attestation bundles were made for docbt-0.1.7-py3-none-any.whl:
Publisher:
release.yml on aleenprd/docbt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
docbt-0.1.7-py3-none-any.whl -
Subject digest:
c089cffa3d2152458559178a6555ebe61e6c25741b25ee5d99fe8952f87b4b61 - Sigstore transparency entry: 692518024
- Sigstore integration time:
-
Permalink:
aleenprd/docbt@0d30ae6ecf15437dd3b35755438106f1b39fb7c5 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/aleenprd
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0d30ae6ecf15437dd3b35755438106f1b39fb7c5 -
Trigger Event:
workflow_dispatch
-
Statement type: