Skip to main content

docdbt (documentation build tool) is a Streamlit app for managing dbt project documentation.

Reason this release was yanked:

not compatible with mac

Project description

๐Ÿ”ง docbt - AI-Powered DBT Documentation Assistant

CI Python 3.10+ License Code style: ruff

Generate YAML documentation for DBT models with AI assistance. Built with Streamlit for an intuitive web interface.

๐Ÿ“– Overview

docbt is an AI-powered assistant designed to streamline DBT (Data Build Tool) documentation workflows. Upload your data, chat with AI models, and generate professional YAML documentation ready for your DBT projects.

โœจ Key Features

  • ๐Ÿค– Multi-LLM Support: Choose from OpenAI's GPT models, local Ollama, or LM Studio.
  • ๐Ÿ’ฌ Interactive Chat: Ask questions about your data and get specific recommendations.
  • ๐Ÿ”ง Developer Mode: Token metrics, response times, parameters, prompts and debugging information.
  • โš™๏ธ Advanced Configuration: Fine-tune generation parameters (temperature, max tokens, top-p, stop sequences).
  • ๐Ÿง  Chain of Thought: View AI reasoning process (when available).
  • ๐Ÿ“ˆ Real-time Metrics: Monitor API usage, token consumption, and performance.

More to come

  • ๐Ÿ“Š Data Upload & Analysis: Upload files for intelligent data insights.

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.10 or higher
  • Optional: Ollama, LM Studio, or OpenAI API key
  • Optional: Docker (for containerized deployment)

Installation

Option 1: Using Docker (Recommended)

# Clone the repository
git clone <your-repo-url>
cd docbt

# Run with Docker Compose
docker-compose up docbt

# Access at http://localhost:8501

For detailed Docker instructions, see Docker Guide.

Option 2: Using pip

  1. Clone the repository

    git clone <your-repo-url>
    cd docbt
    
  2. Install dependencies

    pip install -e .
    
    # With optional providers
    pip install -e ".[snowflake]"  # For Snowflake support
    pip install -e ".[bigquery]"   # For BigQuery support
    pip install -e ".[all-providers]"  # For all providers
    
  3. Set up environment variables (optional)

    # Copy and edit the environment file
    cp .env.example .env
    
    # Add your API keys (optional)
    OPENAI_API_KEY=your_openai_api_key_here
    OLLAMA_HOST=localhost
    OLLAMA_PORT=11434
    LMSTUDIO_HOST=localhost
    LMSTUDIO_PORT=1234
    
  4. Run the application

    # Using the CLI
    docbt run
    
    # Or directly with Python
    python -m streamlit run src/docbt/server/server.py
    

๐ŸŽฏ Usage

1. Setup Tab

Configure your AI provider and settings:

  • Choose Provider: OpenAI, Ollama, or LM Studio
  • Developer Mode: Enable advanced settings and metrics
  • System Prompt: Customize AI behavior (developer mode)
  • Generation Parameters: Control temperature, max tokens, top-p, stop sequences

2. Chat Tab

Interact with your AI assistant:

  • Ask questions about DBT best practices
  • Get recommendations for data modeling
  • Request specific YAML configurations
  • Enable "Chain of Thought" to see AI reasoning

3. Data Tab

Upload and analyze your datasets:

  • Supported Formats: CSV, JSON
  • Auto-Analysis: Column types, sample data, statistics
  • Context Integration: Data automatically included in AI conversations

4. Additional Tabs

  • Columns: Column-specific analysis and recommendations
  • File: File management and operations
  • Docs: Documentation generation and export

๐Ÿ”ง Configuration

LLM Providers

OpenAI

# Set your API key
export OPENAI_API_KEY="sk-..."

# Or add to .env file
OPENAI_API_KEY=sk-...

Supported Models:

  • gpt-4o (latest)
  • gpt-4o-mini (cost-effective)
  • gpt-4-turbo
  • gpt-3.5-turbo

Ollama (Local)

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama2
ollama pull mistral

# Start server (default: http://localhost:11434)
ollama serve

LM Studio (Local)

  1. Download from lmstudio.ai
  2. Load a model in the Chat tab
  3. Enable "Local Server" (default: http://localhost:1234)

Advanced Parameters

In Developer Mode, fine-tune AI generation:

  • Max Tokens: Maximum response length (100-4000)
  • Temperature: Creativity level (0.0-2.0)
    • 0.0: Deterministic, focused
    • 1.0: Balanced
    • 2.0: Creative, random
  • Top P: Nucleus sampling (0.0-1.0)
  • Stop Sequences: Custom stop words/phrases

๐Ÿ’ก Example Workflows

DBT Schema Generation

  1. Upload your CSV data in the Data tab
  2. Go to Chat tab
  3. Ask: "Generate a DBT schema.yml for this dataset with appropriate tests"
  4. Get YAML output with column tests, descriptions, and constraints

Data Quality Assessment

  1. Upload data and ask: "What data quality issues do you see?"
  2. Get recommendations for:
    • Column-level tests (not_null, unique, accepted_values)
    • Model-level tests (freshness, volume checks)
    • Relationship validations

Custom Documentation

  1. Ask: "Create documentation for these columns with business context"
  2. Get professional descriptions ready for your DBT models

๐Ÿ” Features Deep Dive

Data Context Enhancement

When you upload data, the AI automatically receives:

  • File metadata (name, size, record count)
  • Column information (names, data types in JSON format)
  • Sample data (first 10 records as JSON)
  • Statistical summaries

Token Optimization

  • Smart Context: Data context sent once in system prompt (not repeated per message)
  • Token Counting: Real-time token usage monitoring
  • Cost Control: Configurable limits and usage tracking

Developer Tools

  • Response Metrics: Time, tokens/second, model info
  • Request Debugging: Full system prompts with data context
  • Chain of Thought: AI reasoning visibility
  • Error Handling: Graceful fallbacks and error reporting

๐Ÿ—๏ธ Project Structure

docdt/
โ”œโ”€โ”€ src/docbt/
โ”‚   โ”œโ”€โ”€ cli/               # Command-line interface
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ docbt_cli.py   # CLI entry point
โ”‚   โ””โ”€โ”€ server/            # Streamlit application
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ server.py      # Main application
โ”‚       โ””โ”€โ”€ logo.png       # Application logo
โ”œโ”€โ”€ pyproject.toml         # Project configuration
โ”œโ”€โ”€ README.md             # This file
โ”œโ”€โ”€ .env                  # Environment variables
โ””โ”€โ”€ requirements.txt      # Dependencies

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

Quick Start:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes and add tests
  4. Run ruff format . and pytest
  5. Commit your changes (git commit -m 'feat: add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

CI/CD: All pull requests are automatically tested with our CI pipeline. See CI/CD Documentation for details.

๐Ÿ“‹ Requirements

System Requirements

  • Python 3.10+
  • 2GB RAM minimum (4GB+ recommended for local models)
  • Internet connection (for OpenAI API)

Dependencies

  • streamlit - Web interface
  • openai - OpenAI API client
  • requests - HTTP client for local models
  • tiktoken - Token counting
  • pandas - Data manipulation
  • python-dotenv - Environment management
  • loguru - Logging
  • click - CLI framework

๐Ÿ› Troubleshooting

Common Issues

LLM Connection Errors

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Verify LM Studio server
curl http://localhost:1234/v1/models

# Test OpenAI API key
curl -H "Authorization: Bearer $OPENAI_API_KEY" https://api.openai.com/v1/models

Import Errors

# Reinstall in development mode
pip install -e .

# Or install from requirements
pip install -r requirements.txt

Permission Issues

# Fix file permissions
chmod +x docbt

Docker Issues

# View container logs
docker-compose logs docbt

# Check if container is running
docker ps

# Restart container
docker-compose restart docbt

See Docker Guide for more Docker-specific troubleshooting.

๐Ÿšข Deployment

Docker Deployment (Recommended)

Local Development:

docker-compose up docbt

Production with Cloud Providers:

docker-compose --profile production up -d docbt-production

Build and Push to Registry:

# Build
docker build -t your-registry/docbt:latest .

# Push to Docker Hub
docker push your-registry/docbt:latest

# Or push to GitHub Container Registry
docker tag docbt:latest ghcr.io/your-username/docbt:latest
docker push ghcr.io/your-username/docbt:latest

Cloud Platforms

AWS ECS/Fargate:

  • Use the production Docker image
  • Mount secrets for API keys
  • Configure ALB for port 8501

Google Cloud Run:

# Build and deploy
gcloud builds submit --tag gcr.io/PROJECT-ID/docbt
gcloud run deploy docbt --image gcr.io/PROJECT-ID/docbt --platform managed

Azure Container Instances:

az container create \
  --resource-group myResourceGroup \
  --name docbt \
  --image your-registry/docbt:latest \
  --ports 8501 \
  --environment-variables DOCBT_OPENAI_API_KEY=sk-...

Kubernetes: See Kubernetes deployment examples in the Docker guide.

For detailed deployment instructions, see Docker Guide.

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

๐Ÿ“ฌ Support


Happy documenting! ๐ŸŽ‰ Generate better DBT documentation with AI assistance.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docbt-0.1.0.tar.gz (50.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docbt-0.1.0-py3-none-any.whl (54.4 kB view details)

Uploaded Python 3

File details

Details for the file docbt-0.1.0.tar.gz.

File metadata

  • Download URL: docbt-0.1.0.tar.gz
  • Upload date:
  • Size: 50.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docbt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4b73643c93978cbf37645e4a4b290b1d83b07e6aae4ed38642972719af66fb6f
MD5 0022df0e27211ec66d007ea154033fd7
BLAKE2b-256 8364059021f564619b5e93fb5a51d8b19a015a5c3021e2a446b2c19a3d79ef7c

See more details on using hashes here.

Provenance

The following attestation bundles were made for docbt-0.1.0.tar.gz:

Publisher: release.yml on aleenprd/docbt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docbt-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: docbt-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 54.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docbt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8e6330d5fe4c52cc826d49d00a1d913b2452b75ea467f3dbfc5d0dd391e22fd0
MD5 a0ba49c4cf3350b7a06e805237a7d31f
BLAKE2b-256 a27bceb5f5e156ca1e499af0eeb3e24c127cfaffc7ad81c696a8e608997522e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for docbt-0.1.0-py3-none-any.whl:

Publisher: release.yml on aleenprd/docbt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page