docdbt (documentation build tool) is a Streamlit app for managing dbt project documentation.
Project description
๐ง docbt - AI-Powered DBT Documentation Assistant
Generate YAML documentation for DBT models with AI assistance. Built with Streamlit for an intuitive web interface.
๐ Overview
docbt is an AI-powered assistant designed to streamline DBT (Data Build Tool) documentation workflows. Upload your data, chat with AI models, and generate professional YAML documentation ready for your DBT projects.
โจ Key Features
- ๐ค Multi-LLM Support: Choose from OpenAI's GPT models, local Ollama, or LM Studio.
- ๐ฌ Interactive Chat: Ask questions about your data and get specific recommendations.
- ๐ง Developer Mode: Token metrics, response times, parameters, prompts and debugging information.
- โ๏ธ Advanced Configuration: Fine-tune generation parameters (temperature, max tokens, top-p, stop sequences).
- ๐ง Chain of Thought: View AI reasoning process (when available).
- ๐ Real-time Metrics: Monitor API usage, token consumption, and performance.
More to come
- ๐ Data Upload & Analysis: Upload files for intelligent data insights.
๐ Quick Start
Prerequisites
- Python 3.10 or higher
- Optional: Ollama, LM Studio, or OpenAI API key
- Optional: Docker (for containerized deployment)
Installation
Option 1: Using Docker (Recommended)
# Clone the repository
git clone <your-repo-url>
cd docbt
# Run with Docker Compose
docker-compose up docbt
# Access at http://localhost:8501
For detailed Docker instructions, see Docker Guide.
Option 2: Using pip
-
Clone the repository
git clone <your-repo-url> cd docbt
-
Install dependencies
pip install -e . # With optional providers pip install -e ".[snowflake]" # For Snowflake support pip install -e ".[bigquery]" # For BigQuery support pip install -e ".[all-providers]" # For all providers
-
Set up environment variables (optional)
# Copy and edit the environment file cp .env.example .env # Add your API keys (optional) OPENAI_API_KEY=your_openai_api_key_here OLLAMA_HOST=localhost OLLAMA_PORT=11434 LMSTUDIO_HOST=localhost LMSTUDIO_PORT=1234
-
Run the application
# Using the CLI docbt run # Or directly with Python python -m streamlit run src/docbt/server/server.py
๐ฏ Usage
1. Setup Tab
Configure your AI provider and settings:
- Choose Provider: OpenAI, Ollama, or LM Studio
- Developer Mode: Enable advanced settings and metrics
- System Prompt: Customize AI behavior (developer mode)
- Generation Parameters: Control temperature, max tokens, top-p, stop sequences
2. Chat Tab
Interact with your AI assistant:
- Ask questions about DBT best practices
- Get recommendations for data modeling
- Request specific YAML configurations
- Enable "Chain of Thought" to see AI reasoning
3. Data Tab
Upload and analyze your datasets:
- Supported Formats: CSV, JSON
- Auto-Analysis: Column types, sample data, statistics
- Context Integration: Data automatically included in AI conversations
4. Additional Tabs
- Columns: Column-specific analysis and recommendations
- File: File management and operations
- Docs: Documentation generation and export
๐ง Configuration
LLM Providers
OpenAI
# Set your API key
export OPENAI_API_KEY="sk-..."
# Or add to .env file
OPENAI_API_KEY=sk-...
Supported Models:
- gpt-4o (latest)
- gpt-4o-mini (cost-effective)
- gpt-4-turbo
- gpt-3.5-turbo
Ollama (Local)
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a model
ollama pull llama2
ollama pull mistral
# Start server (default: http://localhost:11434)
ollama serve
LM Studio (Local)
- Download from lmstudio.ai
- Load a model in the Chat tab
- Enable "Local Server" (default: http://localhost:1234)
Advanced Parameters
In Developer Mode, fine-tune AI generation:
- Max Tokens: Maximum response length (100-4000)
- Temperature: Creativity level (0.0-2.0)
0.0: Deterministic, focused1.0: Balanced2.0: Creative, random
- Top P: Nucleus sampling (0.0-1.0)
- Stop Sequences: Custom stop words/phrases
๐ก Example Workflows
DBT Schema Generation
- Upload your CSV data in the Data tab
- Go to Chat tab
- Ask: "Generate a DBT schema.yml for this dataset with appropriate tests"
- Get YAML output with column tests, descriptions, and constraints
Data Quality Assessment
- Upload data and ask: "What data quality issues do you see?"
- Get recommendations for:
- Column-level tests (
not_null,unique,accepted_values) - Model-level tests (freshness, volume checks)
- Relationship validations
- Column-level tests (
Custom Documentation
- Ask: "Create documentation for these columns with business context"
- Get professional descriptions ready for your DBT models
๐ Features Deep Dive
Data Context Enhancement
When you upload data, the AI automatically receives:
- File metadata (name, size, record count)
- Column information (names, data types in JSON format)
- Sample data (first 10 records as JSON)
- Statistical summaries
Token Optimization
- Smart Context: Data context sent once in system prompt (not repeated per message)
- Token Counting: Real-time token usage monitoring
- Cost Control: Configurable limits and usage tracking
Developer Tools
- Response Metrics: Time, tokens/second, model info
- Request Debugging: Full system prompts with data context
- Chain of Thought: AI reasoning visibility
- Error Handling: Graceful fallbacks and error reporting
๐๏ธ Project Structure
docdt/
โโโ src/docbt/
โ โโโ cli/ # Command-line interface
โ โ โโโ __init__.py
โ โ โโโ docbt_cli.py # CLI entry point
โ โโโ server/ # Streamlit application
โ โโโ __init__.py
โ โโโ server.py # Main application
โ โโโ logo.png # Application logo
โโโ pyproject.toml # Project configuration
โโโ README.md # This file
โโโ .env # Environment variables
โโโ requirements.txt # Dependencies
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details.
Quick Start:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes and add tests
- Run
ruff format .andpytest - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
CI/CD: All pull requests are automatically tested with our CI pipeline. See CI/CD Documentation for details.
๐ Requirements
System Requirements
- Python 3.10+
- 2GB RAM minimum (4GB+ recommended for local models)
- Internet connection (for OpenAI API)
Dependencies
streamlit- Web interfaceopenai- OpenAI API clientrequests- HTTP client for local modelstiktoken- Token countingpandas- Data manipulationpython-dotenv- Environment managementloguru- Loggingclick- CLI framework
๐ Troubleshooting
Common Issues
LLM Connection Errors
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Verify LM Studio server
curl http://localhost:1234/v1/models
# Test OpenAI API key
curl -H "Authorization: Bearer $OPENAI_API_KEY" https://api.openai.com/v1/models
Import Errors
# Reinstall in development mode
pip install -e .
# Or install from requirements
pip install -r requirements.txt
Permission Issues
# Fix file permissions
chmod +x docbt
Docker Issues
# View container logs
docker-compose logs docbt
# Check if container is running
docker ps
# Restart container
docker-compose restart docbt
See Docker Guide for more Docker-specific troubleshooting.
๐ข Deployment
Docker Deployment (Recommended)
Local Development:
docker-compose up docbt
Production with Cloud Providers:
docker-compose --profile production up -d docbt-production
Build and Push to Registry:
# Build
docker build -t your-registry/docbt:latest .
# Push to Docker Hub
docker push your-registry/docbt:latest
# Or push to GitHub Container Registry
docker tag docbt:latest ghcr.io/your-username/docbt:latest
docker push ghcr.io/your-username/docbt:latest
Cloud Platforms
AWS ECS/Fargate:
- Use the production Docker image
- Mount secrets for API keys
- Configure ALB for port 8501
Google Cloud Run:
# Build and deploy
gcloud builds submit --tag gcr.io/PROJECT-ID/docbt
gcloud run deploy docbt --image gcr.io/PROJECT-ID/docbt --platform managed
Azure Container Instances:
az container create \
--resource-group myResourceGroup \
--name docbt \
--image your-registry/docbt:latest \
--ports 8501 \
--environment-variables DOCBT_OPENAI_API_KEY=sk-...
Kubernetes: See Kubernetes deployment examples in the Docker guide.
For detailed deployment instructions, see Docker Guide.
๏ฟฝ Troubleshooting
Common Issues
Missing tiktoken on macOS:
If you encounter an error about tiktoken being missing on macOS:
# Clean install with no cache
pip uninstall -y docbt tiktoken
pip cache purge
pip install --no-cache-dir docbt
Apple Silicon (M1/M2/M3) Macs: Ensure you're using native ARM Python (not Rosetta):
python -c "import platform; print(platform.machine())"
# Should output: arm64
For more troubleshooting help, see the Troubleshooting Guide.
๏ฟฝ๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
๐ฌ Support
- ๐ Issues: GitHub Issues
- ๐ฌ Discussions: GitHub Discussions
- ๐ Troubleshooting: Troubleshooting Guide
- ๐ง Email: your-email@example.com
Happy documenting! ๐ Generate better DBT documentation with AI assistance.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docbt-0.1.2.tar.gz.
File metadata
- Download URL: docbt-0.1.2.tar.gz
- Upload date:
- Size: 54.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
453bdff5117addeb68c8d734f944fc054ce7bedfd920a19283356dd294bd3362
|
|
| MD5 |
28480168d253780b442b13bc4c761651
|
|
| BLAKE2b-256 |
bed7de5aad9ef02e3083e60a778e6f6091f7ea32cd193ae0428ac80230bf3294
|
Provenance
The following attestation bundles were made for docbt-0.1.2.tar.gz:
Publisher:
release.yml on aleenprd/docbt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
docbt-0.1.2.tar.gz -
Subject digest:
453bdff5117addeb68c8d734f944fc054ce7bedfd920a19283356dd294bd3362 - Sigstore transparency entry: 622086685
- Sigstore integration time:
-
Permalink:
aleenprd/docbt@9a1da4d63ae90321b107623bcbe8a7f016cc5644 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/aleenprd
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9a1da4d63ae90321b107623bcbe8a7f016cc5644 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file docbt-0.1.2-py3-none-any.whl.
File metadata
- Download URL: docbt-0.1.2-py3-none-any.whl
- Upload date:
- Size: 54.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4bca47e60cb533d570795bdd9bb984a1ad80d3aad6798f764083b947596d1a1
|
|
| MD5 |
06c033f77177a441fa73dc88d7383f70
|
|
| BLAKE2b-256 |
6525a44c3376deaae67ec2e2f24c60d63973e2f1228eb641fe71ecd131fafd74
|
Provenance
The following attestation bundles were made for docbt-0.1.2-py3-none-any.whl:
Publisher:
release.yml on aleenprd/docbt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
docbt-0.1.2-py3-none-any.whl -
Subject digest:
f4bca47e60cb533d570795bdd9bb984a1ad80d3aad6798f764083b947596d1a1 - Sigstore transparency entry: 622086687
- Sigstore integration time:
-
Permalink:
aleenprd/docbt@9a1da4d63ae90321b107623bcbe8a7f016cc5644 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/aleenprd
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9a1da4d63ae90321b107623bcbe8a7f016cc5644 -
Trigger Event:
workflow_dispatch
-
Statement type: