Skip to main content

AI-powered dbt helper with modern GPT models and enhanced prompts

Project description

DBT AI

An application that allows AI powered DBT development and recommendations for your DBT models.

🚀 What's New in v0.4.0

  • Agent-First Design: JSON output is now the default format for seamless integration with AI coding agents
  • Structured Output: Machine-parseable JSON responses that work with any agent or automation tool
  • Fast Metadata Checks: New --metadata-only flag for quick metadata coverage analysis without AI processing
  • Modern OpenAI API: Upgraded to the latest OpenAI API (v1.x) with support for GPT-4o and GPT-4o-mini
  • Enhanced Prompts: Completely rewritten prompts with better structure and clearer guidelines
  • Smart Model Selection: Automatically uses GPT-4o for advanced suggestions and GPT-4o-mini for basic ones
  • Better Error Handling: Improved error handling and fallback mechanisms
  • Configuration Options: Environment variables for customizing AI models and settings
  • Maintained Compatibility: All existing CLI commands work exactly the same

Features

  • Agent-Ready Output: Structured JSON output by default for seamless automation and agent integration
  • AI-Powered Analysis: Scans all dbt models and generates recommendations for each model
    • Basic recommendations for quick improvements (default: GPT-4o-mini)
    • Advanced recommendations for complex optimizations (GPT-4o)
  • Fast Metadata Checking: Instantly verify which models are missing documentation
  • Model Creation: Generate new dbt models from natural language prompts
  • Lineage Analysis: Understand model dependencies and data flow
  • Multi-Database Support: Works with Snowflake, PostgreSQL, Redshift, and BigQuery
  • Human-Readable Reports: Optional HTML reports for visual analysis

Installation

First time installation:

pip install dbt-ai

To upgrade to the latest version:

pip install dbt-ai --upgrade

To install a specific version:

pip install dbt-ai==<version>

Replace <version> with your desired version e.g. 0.2.0. You can view available versions in the Releases section of this repo or on our Pypi page. WARNING: This is an early phase application that may still contain bugs

Prerequisites

  • In order to benefit from AI features, you need your own OpenAI API Key with the initial version of this application
    • Once you sign up to OpenAI you can create an API key.
    • Trial version gives you a certain amount of credits allowing you to make many API calls
    • Usage beyond the trial credits require billing details. API usage pricing provides more info
  • Ideally you already have dbt project to test this out on
  • Python 3.10 or greater is required

Usage

Setting up your API key is needed for all the AI features

  • Set up your OpenAI API key as an environment variable:
export OPENAI_API_KEY="your_openai_api_key"

Configuration

dbt-ai now supports environment variables for customizing AI model selection and behavior:

AI Model Configuration

# Basic suggestions model (default: gpt-4o-mini)
export DBT_AI_BASIC_MODEL="gpt-4o-mini"

# Advanced suggestions model (default: gpt-4o)  
export DBT_AI_ADVANCED_MODEL="gpt-4o"

# Fallback model for compatibility (default: gpt-3.5-turbo)
export DBT_AI_FALLBACK_MODEL="gpt-3.5-turbo"

API Settings

# Maximum tokens per API call (default: 4000)
export DBT_AI_MAX_TOKENS="4000"

# Temperature for AI responses (default: 0.1)
export DBT_AI_TEMPERATURE="0.1"

Recommended Model Selection

  • For cost-conscious users: Use gpt-3.5-turbo for both basic and advanced
  • For best quality: Use gpt-4o-mini for basic and gpt-4o for advanced (default)
  • For organizations: Consider gpt-4 models for production use

Quick Start

Basic Analysis (JSON Output)

Get structured analysis perfect for agents and automation:

dbt-ai -f path/to/dbt/project

Example with current directory:

dbt-ai -f .

This outputs machine-parseable JSON with model analysis, suggestions, and metadata coverage.

Fast Metadata Check

Quick metadata coverage check without AI processing:

dbt-ai -f . --metadata-only

Human-Readable Reports

Generate visual HTML reports for manual review:

dbt-ai -f . --output text

Advanced Usage

Output Formats

# JSON output (default) - perfect for agents
dbt-ai -f . --output json

# Text output - generates HTML report for humans
dbt-ai -f . --output text

Database Configuration

Specify your database type for optimized suggestions:

dbt-ai -f . -d snowflake    # Default
dbt-ai -f . -d postgres     # PostgreSQL
dbt-ai -f . -d redshift     # Amazon Redshift
dbt-ai -f . -d bigquery     # Google BigQuery

Advanced AI Recommendations

Request more sophisticated optimization suggestions:

dbt-ai -f . -a              # Short form
dbt-ai -f . --advanced-rec  # Long form

Combined Examples

# Advanced PostgreSQL analysis with JSON output
dbt-ai -f . -d postgres -a

# Quick metadata check only
dbt-ai -f . --metadata-only

# Full analysis with HTML report
dbt-ai -f . --output text -a

Create DBT Models from prompt (AI)

This feature lets you specify a prompt, which creates AI generated DBT model files in the models/ directory of the specified dbt project. The AI model has access to your sources.yml file, if you wish to refer to any sources in your prompt. Being specific will provide better results.

  1. Run the application with the --create-models flag to specify the prompt you wish to use to create your DBT models
dbt-ai -f path/to/dbt/project --create-models 'your prompt goes here'

Here is an example:

dbt-ai -f . --create-models 'Write me a model that uses all the sources available in sources.yml and joins them together using the id column'

Output Formats

JSON Output (Default)

Perfect for AI agents, automation tools, and programmatic analysis:

{
  "operation": "full_analysis",
  "project_path": "./sample-dbt-project",
  "total_models": 3,
  "models": [
    {
      "name": "customer_summary",
      "has_metadata": true,
      "suggestions": "Consider adding data freshness tests...",
      "dependencies": ["raw_customers", "raw_orders"]
    }
  ],
  "metadata_coverage_percent": 66.7,
  "lineage_description": "customer_summary depends on raw_customers, raw_orders..."
}

HTML Report

Visual reports for human analysis when using --output text:

🌐 View Live Demo Report

The HTML report includes:

  • 🤖 AI-powered improvement suggestions for each dbt model
  • 📋 Metadata coverage analysis showing which models need documentation
  • 🎨 Professional styling with responsive design
  • 🔗 Model lineage information and dependencies

The demo above shows the actual output generated from the sample dbt project included in this repository.

Agent Integration

dbt-ai is designed to work seamlessly with AI coding agents like Claude Code, Cursor, and GitHub Copilot. The JSON output format allows agents to:

  • Analyze dbt projects programmatically without human intervention
  • Identify optimization opportunities across multiple models
  • Check metadata coverage for documentation completeness
  • Understand model lineage for impact analysis
  • Generate improvement suggestions based on current best practices

Example Agent Workflow

# Agent runs analysis
dbt-ai -f ./dbt-project --output json > analysis.json

# Agent parses results to identify issues
cat analysis.json | jq '.models[] | select(.has_metadata == false) | .name'

# Agent can then take corrective actions based on structured data

Changelog

v0.4.0 (Latest)

  • Agent-first design: JSON output is now the default format
  • Fast metadata checks: New --metadata-only flag for quick coverage analysis
  • Structured output: Machine-parseable JSON for seamless automation
  • Agent integration: Designed for seamless use with AI coding agents
  • Bug fixes: Resolved directory handling issues in YAML file discovery
  • Enhanced error handling: Better safety checks for file processing

v0.3.0

  • Major upgrade: Modernized OpenAI API integration (v1.x)
  • Enhanced AI prompts: Completely rewritten prompts with better structure and context
  • Smart model selection: GPT-4o for advanced suggestions, GPT-4o-mini for basic ones
  • Structured responses: JSON-based output parsing with Pydantic validation
  • Configuration options: Environment variables for model and API customization
  • Improved error handling: Better fallbacks and error messages
  • Backward compatibility: All existing commands work unchanged
  • Fixed tests: All test suite now passes correctly

v0.2.x (Previous)

  • Basic OpenAI integration with GPT-3.5-turbo
  • Simple prompt-based suggestions
  • Basic and advanced recommendation modes
  • HTML report generation
  • Metadata checking functionality

Contributing

We welcome contributions to the project! Please feel free to open issues or submit pull requests with your improvements and suggestions.

See CONTRIBUTING.md to get started and develop in this repo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt-ai-0.4.0.tar.gz (20.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_ai-0.4.0-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file dbt-ai-0.4.0.tar.gz.

File metadata

  • Download URL: dbt-ai-0.4.0.tar.gz
  • Upload date:
  • Size: 20.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.15

File hashes

Hashes for dbt-ai-0.4.0.tar.gz
Algorithm Hash digest
SHA256 37eca0fe594be6fe7058425a0d1247d3c4beab11c4c12ba8144d6cd61f3ff88d
MD5 c06df53a21ef257af7adcb5c2a651a09
BLAKE2b-256 319121ace0628a1afadb398096d3f36dcedc2cdde0856af43c5835142581d5ec

See more details on using hashes here.

File details

Details for the file dbt_ai-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: dbt_ai-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.15

File hashes

Hashes for dbt_ai-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2b2db08a129b625bf1d4d7d53d63283d432f1807e4fee88eec7f4da0153a4f70
MD5 d2a33313645dd33dad1988c38ca8fdbb
BLAKE2b-256 18cdfd595e70665e64ee8b0af631241541a05efbc41451aaa53a0be96836f714

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page