Skip to main content

docdbt (documentation build tool) is a Streamlit app for managing dbt project documentation.

Project description

๐Ÿ”ง docbt - AI-Powered DBT Documentation Assistant

CI Python 3.10+ License Code style: ruff

Generate YAML documentation for DBT models with AI assistance. Built with Streamlit for an intuitive web interface.

๐Ÿ“– Overview

docbt is an AI-powered assistant designed to streamline DBT (Data Build Tool) documentation workflows. Upload your data, chat with AI models, and generate professional YAML documentation ready for your DBT projects.

โœจ Key Features

  • ๐Ÿค– Multi-LLM Support: Choose from OpenAI's GPT models, local Ollama, or LM Studio.
  • ๐Ÿ’ฌ Interactive Chat: Ask questions about your data and get specific recommendations.
  • ๐Ÿ”ง Developer Mode: Token metrics, response times, parameters, prompts and debugging information.
  • โš™๏ธ Advanced Configuration: Fine-tune generation parameters (temperature, max tokens, top-p, stop sequences).
  • ๐Ÿง  Chain of Thought: View AI reasoning process (when available).
  • ๐Ÿ“ˆ Real-time Metrics: Monitor API usage, token consumption, and performance.

More to come

  • ๐Ÿ“Š Data Upload & Analysis: Upload files for intelligent data insights.

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.10 or higher
  • Optional: Ollama, LM Studio, or OpenAI API key
  • Optional: Docker (for containerized deployment)

Installation

Option 1: Using Docker (Recommended)

# Clone the repository
git clone <your-repo-url>
cd docbt

# Run with Docker Compose
docker-compose up docbt

# Access at http://localhost:8501

For detailed Docker instructions, see Docker Guide.

Option 2: Using pip

  1. Clone the repository

    git clone <your-repo-url>
    cd docbt
    
  2. Install dependencies

    pip install -e .
    
    # With optional providers
    pip install -e ".[snowflake]"  # For Snowflake support
    pip install -e ".[bigquery]"   # For BigQuery support
    pip install -e ".[all-providers]"  # For all providers
    
  3. Set up environment variables (optional)

    # Copy and edit the environment file
    cp .env.example .env
    
    # Add your API keys (optional)
    OPENAI_API_KEY=your_openai_api_key_here
    OLLAMA_HOST=localhost
    OLLAMA_PORT=11434
    LMSTUDIO_HOST=localhost
    LMSTUDIO_PORT=1234
    
  4. Run the application

    # Using the CLI
    docbt run
    
    # Or directly with Python
    python -m streamlit run src/docbt/server/server.py
    

๐ŸŽฏ Usage

1. Setup Tab

Configure your AI provider and settings:

  • Choose Provider: OpenAI, Ollama, or LM Studio
  • Developer Mode: Enable advanced settings and metrics
  • System Prompt: Customize AI behavior (developer mode)
  • Generation Parameters: Control temperature, max tokens, top-p, stop sequences

2. Chat Tab

Interact with your AI assistant:

  • Ask questions about DBT best practices
  • Get recommendations for data modeling
  • Request specific YAML configurations
  • Enable "Chain of Thought" to see AI reasoning

3. Data Tab

Upload and analyze your datasets:

  • Supported Formats: CSV, JSON
  • Auto-Analysis: Column types, sample data, statistics
  • Context Integration: Data automatically included in AI conversations

4. Additional Tabs

  • Columns: Column-specific analysis and recommendations
  • File: File management and operations
  • Docs: Documentation generation and export

๐Ÿ”ง Configuration

LLM Providers

OpenAI

# Set your API key
export OPENAI_API_KEY="sk-..."

# Or add to .env file
OPENAI_API_KEY=sk-...

Supported Models:

  • gpt-4o (latest)
  • gpt-4o-mini (cost-effective)
  • gpt-4-turbo
  • gpt-3.5-turbo

Ollama (Local)

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama2
ollama pull mistral

# Start server (default: http://localhost:11434)
ollama serve

LM Studio (Local)

  1. Download from lmstudio.ai
  2. Load a model in the Chat tab
  3. Enable "Local Server" (default: http://localhost:1234)

Advanced Parameters

In Developer Mode, fine-tune AI generation:

  • Max Tokens: Maximum response length (100-4000)
  • Temperature: Creativity level (0.0-2.0)
    • 0.0: Deterministic, focused
    • 1.0: Balanced
    • 2.0: Creative, random
  • Top P: Nucleus sampling (0.0-1.0)
  • Stop Sequences: Custom stop words/phrases

๐Ÿ’ก Example Workflows

DBT Schema Generation

  1. Upload your CSV data in the Data tab
  2. Go to Chat tab
  3. Ask: "Generate a DBT schema.yml for this dataset with appropriate tests"
  4. Get YAML output with column tests, descriptions, and constraints

Data Quality Assessment

  1. Upload data and ask: "What data quality issues do you see?"
  2. Get recommendations for:
    • Column-level tests (not_null, unique, accepted_values)
    • Model-level tests (freshness, volume checks)
    • Relationship validations

Custom Documentation

  1. Ask: "Create documentation for these columns with business context"
  2. Get professional descriptions ready for your DBT models

๐Ÿ” Features Deep Dive

Data Context Enhancement

When you upload data, the AI automatically receives:

  • File metadata (name, size, record count)
  • Column information (names, data types in JSON format)
  • Sample data (first 10 records as JSON)
  • Statistical summaries

Token Optimization

  • Smart Context: Data context sent once in system prompt (not repeated per message)
  • Token Counting: Real-time token usage monitoring
  • Cost Control: Configurable limits and usage tracking

Developer Tools

  • Response Metrics: Time, tokens/second, model info
  • Request Debugging: Full system prompts with data context
  • Chain of Thought: AI reasoning visibility
  • Error Handling: Graceful fallbacks and error reporting

๐Ÿ—๏ธ Project Structure

docdt/
โ”œโ”€โ”€ src/docbt/
โ”‚   โ”œโ”€โ”€ cli/               # Command-line interface
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ docbt_cli.py   # CLI entry point
โ”‚   โ””โ”€โ”€ server/            # Streamlit application
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ server.py      # Main application
โ”‚       โ””โ”€โ”€ logo.png       # Application logo
โ”œโ”€โ”€ pyproject.toml         # Project configuration
โ”œโ”€โ”€ README.md             # This file
โ”œโ”€โ”€ .env                  # Environment variables
โ””โ”€โ”€ requirements.txt      # Dependencies

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

Quick Start:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes and add tests
  4. Run ruff format . and pytest
  5. Commit your changes (git commit -m 'feat: add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

CI/CD: All pull requests are automatically tested with our CI pipeline. See CI/CD Documentation for details.

๐Ÿ“‹ Requirements

System Requirements

  • Python 3.10+
  • 2GB RAM minimum (4GB+ recommended for local models)
  • Internet connection (for OpenAI API)

Dependencies

  • streamlit - Web interface
  • openai - OpenAI API client
  • requests - HTTP client for local models
  • tiktoken - Token counting
  • pandas - Data manipulation
  • python-dotenv - Environment management
  • loguru - Logging
  • click - CLI framework

๐Ÿ› Troubleshooting

Common Issues

LLM Connection Errors

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Verify LM Studio server
curl http://localhost:1234/v1/models

# Test OpenAI API key
curl -H "Authorization: Bearer $OPENAI_API_KEY" https://api.openai.com/v1/models

Import Errors

# Reinstall in development mode
pip install -e .

# Or install from requirements
pip install -r requirements.txt

Permission Issues

# Fix file permissions
chmod +x docbt

Docker Issues

# View container logs
docker-compose logs docbt

# Check if container is running
docker ps

# Restart container
docker-compose restart docbt

See Docker Guide for more Docker-specific troubleshooting.

๐Ÿšข Deployment

Docker Deployment (Recommended)

Local Development:

docker-compose up docbt

Production with Cloud Providers:

docker-compose --profile production up -d docbt-production

Build and Push to Registry:

# Build
docker build -t your-registry/docbt:latest .

# Push to Docker Hub
docker push your-registry/docbt:latest

# Or push to GitHub Container Registry
docker tag docbt:latest ghcr.io/your-username/docbt:latest
docker push ghcr.io/your-username/docbt:latest

Cloud Platforms

AWS ECS/Fargate:

  • Use the production Docker image
  • Mount secrets for API keys
  • Configure ALB for port 8501

Google Cloud Run:

# Build and deploy
gcloud builds submit --tag gcr.io/PROJECT-ID/docbt
gcloud run deploy docbt --image gcr.io/PROJECT-ID/docbt --platform managed

Azure Container Instances:

az container create \
  --resource-group myResourceGroup \
  --name docbt \
  --image your-registry/docbt:latest \
  --ports 8501 \
  --environment-variables DOCBT_OPENAI_API_KEY=sk-...

Kubernetes: See Kubernetes deployment examples in the Docker guide.

For detailed deployment instructions, see Docker Guide.

๏ฟฝ Troubleshooting

Common Issues

Missing tiktoken on macOS: If you encounter an error about tiktoken being missing on macOS:

# Clean install with no cache
pip uninstall -y docbt tiktoken
pip cache purge
pip install --no-cache-dir docbt

Apple Silicon (M1/M2/M3) Macs: Ensure you're using native ARM Python (not Rosetta):

python -c "import platform; print(platform.machine())"
# Should output: arm64

For more troubleshooting help, see the Troubleshooting Guide.

๏ฟฝ๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

๐Ÿ“ฌ Support


Happy documenting! ๐ŸŽ‰ Generate better DBT documentation with AI assistance.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docbt-0.1.1.tar.gz (54.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docbt-0.1.1-py3-none-any.whl (54.1 kB view details)

Uploaded Python 3

File details

Details for the file docbt-0.1.1.tar.gz.

File metadata

  • Download URL: docbt-0.1.1.tar.gz
  • Upload date:
  • Size: 54.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docbt-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1531ac37a159b91536d214dc3d547828a268f6f5c549939455519380d4282f9f
MD5 34d263592672f6e8866382a7c03a0c53
BLAKE2b-256 47ca50499df52708d0cd19e979bda9f9a14aed6a6a031dd9cb200b745d514f07

See more details on using hashes here.

Provenance

The following attestation bundles were made for docbt-0.1.1.tar.gz:

Publisher: release.yml on aleenprd/docbt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docbt-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: docbt-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 54.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docbt-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bbc439b36041e7e6796cdc65b652522273a31f7ebaa258aa06c6e479fb16ee23
MD5 11010ea8b5c2a4ca1efa570915438a42
BLAKE2b-256 5e7737846f7b37c2f0e475211ecca866b3f2c5aba2b37725f61c6e832ea24198

See more details on using hashes here.

Provenance

The following attestation bundles were made for docbt-0.1.1-py3-none-any.whl:

Publisher: release.yml on aleenprd/docbt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page