Skip to main content

CI/CD tool for dbt projects with intelligent change detection and selective execution

Project description

dbt-ci

A CI tool for dbt (data build tool) projects that intelligently runs only modified models based on state comparison, supporting multiple execution environments including local, Docker, and dbt runners.

How It Works

dbt-ci uses a cache-based workflow:

  1. init - Downloads reference state from cloud storage (or uses local), compares with current code, and creates a cache of changes
  2. run/delete/ephemeral - Use the cached state automatically (no need to re-specify state paths)

This design ensures:

  • Consistent state across all commands in a CI run
  • Better performance (no redundant state downloads)
  • Simpler CLI (specify state once in init, reuse everywhere)

Installation

From PyPI (Recommended)

pip install dbt-ci

From GitHub

# Install from main branch
pip install git+https://github.com/datablock-dev/dbt-ci.git@main

# Install a specific version
pip install git+https://github.com/datablock-dev/dbt-ci.git@v1.0.0

Local Development

git clone https://github.com/datablock-dev/dbt-ci.git
cd dbt-ci
pip install -e ".[dev]"

After installation, the tool is available as dbt-ci.

Quick Start

The Workflow: Initialize once with init, then run commands that use the cached state.

1. Initialize State

First, initialize the dbt-ci state. This downloads/reads reference state and creates a cache:

dbt-ci init \
  --dbt-project-dir dbt \
  --profiles-dir dbt \
  --reference-target production \
  --state dbt/.dbtstate

With Cloud Storage (GCS/S3):

dbt-ci init \
  --dbt-project-dir dbt \
  --state-uri gs://my-bucket/dbt-state/manifest.json \
  --reference-target production \
  --state dbt/.dbtstate

2. Run Modified Models

After initialization, run commands use the cached state automatically:

# No need to specify --state again!
dbt-ci run \
  --dbt-project-dir dbt \
  --profiles-dir dbt

With Docker:

dbt-ci run \
  --runner docker \
  --docker-image ghcr.io/dbt-labs/dbt-bigquery:latest

Commands

init - Initialize State

Creates initial state from your dbt project. Always run this first. Downloads reference manifest from cloud storage (if specified) and creates a local cache for subsequent commands.

dbt-ci init \
  --dbt-project-dir dbt \
  --profiles-dir dbt \
  --state-uri gs://my-bucket/manifest.json \
  --reference-target production \
  --state dbt/.dbtstate

Options:

  • --state, --reference-state: Local path where state will be downloaded/stored
  • --state-uri: Remote URI for state manifest (e.g., gs://bucket/manifest.json, s3://bucket/manifest.json)
  • --reference-target: Target to use for production/reference manifest (optional)
  • --dbt-version: Specific dbt version to use (e.g., 1.10.13)
  • --adapter, -a: Adapter to install (e.g., dbt-duckdb=1.10.0)

run - Run Modified Models

Detects and runs models that have changed. Uses cached state from init command.

# Run after init - uses cached state
dbt-ci run \
  --dbt-project-dir dbt \
  --mode models

Options:

  • --mode, -m: What to run: all, models, seeds, snapshots, tests (default: all)
  • --defer: Use dbt's defer flag for production state

Examples:

# Run only modified models
dbt-ci run --mode models

# Run modified models with defer to production
dbt-ci run --mode models --defer

# Run all modified resources (models, tests, seeds, etc.)
dbt-ci run --mode all

# With Docker
dbt-ci run --runner docker --mode models

ephemeral - Ephemeral Environment

Creates ephemeral environments for testing without affecting production. Uses cached state from init.

# Run after init
dbt-ci ephemeral --dbt-project-dir dbt

Options:

  • --keep-env: Don't destroy ephemeral environment after run

delete - Delete Removed Models

Detects and deletes models that have been removed from the project. Uses cached state from init.

# Run after init
dbt-ci delete --dbt-project-dir dbt

Runners

dbt-ci supports multiple execution environments:

Local Runner

Execute dbt commands directly on your machine:

# After init
dbt-ci run \
  --runner local \
  --dbt-project-dir dbt

dbt Runner (Python API)

Uses dbt's Python API (fastest, default):

# After init - uses dbt Python API
dbt-ci run \
  --runner dbt \
  --dbt-project-dir dbt

Docker Runner

Run dbt commands inside a Docker container:

dbt-ci run \
  --runner docker \
  --docker-image ghcr.io/dbt-labs/dbt-duckdb:latest \
  --docker-volumes $(pwd):/workspace \
  --dbt-project-dir /workspace/dbt \
  --state /workspace/dbt/.dbtstate

For Apple Silicon Macs:

dbt-ci run \
  --runner docker \
  --docker-platform linux/amd64 \
  --docker-image ghcr.io/dbt-labs/dbt-postgres:latest \
  --docker-volumes $(pwd):/workspace \
  --dbt-project-dir /workspace/dbt

Docker Advanced Options

Platform (for Apple Silicon compatibility):

--docker-platform linux/amd64  # or linux/arm64

Custom Volumes:

--docker-volumes "/host/path:/container/path" --docker-volumes "/another:/path:ro"

Environment Variables:

--docker-env "DBT_ENV=prod" --docker-env "MY_API_KEY=secret"

Network Mode:

--docker-network bridge  # or host, none, container:name

User:

--docker-user "1000:1000"  # or leave empty for auto-detect

Additional Docker Args:

--docker-args "--memory=2g --cpus=2"

Complete Docker Example:

dbt-ci run \
  --runner docker \
  --docker-image ghcr.io/dbt-labs/dbt-postgres:1.7.0 \
  --docker-platform linux/amd64 \
  --docker-env "POSTGRES_HOST=host.docker.internal" \
  --docker-network host \
  --docker-volumes "$(pwd):/workspace" \
  --docker-volumes "$HOME/.aws:/root/.aws:ro" \
  --dbt-project-dir /workspace/dbt \
  --profiles-dir /workspace/dbt \
  --target prod

Global Options

These options apply to all commands:

Option Description Default
--dbt-project-dir Path to dbt project directory .
--profiles-dir Path to profiles.yml directory Auto-detect
--reference-target dbt target for production/reference manifest (init only) None
--target, -t dbt target to use From profiles.yml
--vars, -v YAML string or file path with dbt variables ""
--defer Use dbt's defer flag for production state false
--runner, -r Runner type: local, docker, bash, dbt dbt
--entrypoint Command entrypoint for dbt dbt
--dbt-version Specific dbt version to use Current
--adapter, -a Adapter to install (format: dbt-adapter=version) None
--dry-run Print commands without executing false
--log-level Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL INFO
--slack-webhook Slack webhook URL for notifications None

Init-Specific Options

These options are only available for the init command:

Option Description Default
--state, --reference-state Local path where reference state will be stored None
--state-uri Remote URI for state manifest (e.g., gs://bucket/manifest.json, s3://bucket/manifest.json) None

Docker Options

Option Description Default
--docker-image Docker image for dbt ghcr.io/dbt-labs/dbt-core:latest
--docker-platform Platform (linux/amd64, linux/arm64) Auto-detect
--docker-volumes Volume mounts (format: host:container[:mode]) []
--docker-env Environment variables (format: KEY=VALUE) []
--docker-network Docker network mode host
--docker-user User to run as (UID:GID) Auto-detect
--docker-args Additional docker run arguments ""

Bash Runner Options

Option Description Default
--shell-path, --bash-path Path to shell executable /bin/bash

Cloud Storage Support

dbt-ci supports storing and retrieving state files from cloud storage (GCS, S3), making it ideal for distributed CI/CD workflows.

GCS/S3 State Storage

Store your dbt reference state in cloud storage for shared access across CI runs:

# Initialize and download state from GCS
dbt-ci init \
  --dbt-project-dir dbt \
  --state-uri gs://my-bucket/dbt-state/manifest.json \
  --reference-target production \
  --state dbt/.dbtstate

# Run using cached state (no need to specify URI again)
dbt-ci run --dbt-project-dir dbt --mode models

Benefits:

  • 🔄 Shared State: Download the same reference state across different CI jobs
  • 💾 Cache-Based: After init, commands use local cache (no repeated downloads)
  • 📦 No Git Commits: State files don't need to be committed to version control
  • 🚀 Scalable: Works seamlessly in containerized and distributed environments
  • 🔐 Secure: Leverage cloud IAM and bucket policies for access control

Configuration:

The tool uses cloud credentials from your environment. Ensure your bucket is accessible:

# For GCS
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# For AWS S3
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_DEFAULT_REGION=us-east-1

# Or use IAM roles (recommended in CI/CD)
dbt-ci init --state-uri gs://my-bucket/manifest.json

Supported URI Formats:

  • gs://bucket-name/path/to/manifest.json (Google Cloud Storage)
  • s3://bucket-name/path/to/manifest.json (AWS S3)

Environment Variables

All CLI options can also be set via environment variables:

export DBT_PROJECT_DIR=./dbt
export DBT_PROFILES_DIR=./dbt
export DBT_TARGET=production
export DBT_RUNNER=local

# After running init, just use:
dbt-ci run

Common Environment Variables:

  • DBT_PROJECT_DIR - Path to dbt project
  • DBT_PROFILES_DIR - Path to profiles.yml location
  • DBT_TARGET - Target environment to use
  • DBT_RUNNER - Runner type (local, docker, bash, dbt)

Note: State management is cache-based. Run init once, then subsequent commands automatically use the cached state.

CI/CD Integration

GitHub Actions Example

name: dbt CI

on: [pull_request]

jobs:
  dbt-ci:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
          aws-region: us-east-1
      
      - name: Install dbt-ci
        run: pip install git+https://github.com/datablock-dev/dbt-ci.git@main
      
      - name: Initialize dbt-ci with cloud state
        run: |
          dbt-ci init \
            --dbt-project-dir dbt \
            --state-uri gs://my-dbt-state/prod/manifest.json \
            --reference-target production \
            --state dbt/.dbtstate
      
      - name: Run modified models
        run: |
          dbt-ci run --mode models

GitLab CI Example

dbt-ci:
  image: python:3.11
  script:
    - pip install git+https://github.com/datablock-dev/dbt-ci.git@main
    - dbt-ci init --dbt-project-dir dbt --state-uri gs://my-dbt-state/prod/manifest.json --reference-target production --state dbt/.dbtstate
    - dbt-ci run --mode models
  only:
    - merge_requests

Features

  • 🎯 Smart Detection: Automatically identifies modified, new, and deleted models
  • 📊 Dependency Tracking: Generates and traverses dependency graphs for lineage analysis
  • 🔄 State Comparison: Compares current state against production for precise CI
  • ☁️ Cloud Storage: S3 integration for shared state across distributed CI/CD workflows
  • 🚀 Multiple Runners: Supports local, Docker, bash, and dbt Python API execution
  • 🐳 Docker-First: Extensive Docker configuration for containerized workflows
  • ⚡ Selective Execution: Run only what changed, saving time and resources
  • 🔌 Adapter Support: Install specific dbt versions and adapters on-demand
  • 💬 Notifications: Slack webhook integration for CI/CD alerts
  • ♻️ Ephemeral Environments: Test changes in isolated environments
  • 🧹 Cleanup: Automatically remove deleted models from target warehouse

Use Cases

Pull Request CI

Only build and test models affected by PR changes:

# Initialize with reference state
dbt-ci init --state-uri gs://bucket/manifest.json --reference-target production --state dbt/.dbtstate

# Run modified models with defer
dbt-ci run --mode models --defer

Distributed CI with Cloud Storage

Share state across multiple CI jobs:

# Job 1: Initialize state (downloads from cloud)
dbt-ci init --state-uri gs://my-bucket/manifest.json --reference-target production --state dbt/.dbtstate

# Job 2: Run models (uses cached state)
dbt-ci run --mode models

# Job 3: Run tests (uses cached state)
dbt-ci run --mode tests

Selective Testing

Run tests only for modified models:

# After init
dbt-ci run --mode tests

Schema Migrations

Clean up deleted models from production:

# After init
dbt-ci delete --target production

Multi-Environment Testing

Create ephemeral test environments:

dbt-ci ephemeral --keep-env

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

  1. Clone the repository
  2. Install dependencies: pip install -e ".[dev]"
  3. Run tests: pytest tests/
  4. Run linting: black src/ tests/

Commit Message Format

This project uses Conventional Commits for automated releases:

  • feat: New feature (minor version bump)
  • fix: Bug fix (patch version bump)
  • docs: Documentation changes
  • refactor: Code refactoring
  • test: Adding tests
  • chore: Maintenance tasks

Example:

git commit -m "feat: add Docker runner support"
git commit -m "fix: resolve path resolution on Windows"

See RELEASING.md for details on the automated release process.

License

See LICENSE file for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_ci-1.2.0.tar.gz (120.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_ci-1.2.0-py3-none-any.whl (53.8 kB view details)

Uploaded Python 3

File details

Details for the file dbt_ci-1.2.0.tar.gz.

File metadata

  • Download URL: dbt_ci-1.2.0.tar.gz
  • Upload date:
  • Size: 120.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbt_ci-1.2.0.tar.gz
Algorithm Hash digest
SHA256 30157a69e8d4f5b23b99a9c00e68734da898b24ec39d2744a8a82b8faff136f4
MD5 a9177213c48c9c062ea2d60e4c92f728
BLAKE2b-256 36535917e34bf3f2af4060b359788211e648d1f32bc317930ee662e48e5d8a85

See more details on using hashes here.

File details

Details for the file dbt_ci-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: dbt_ci-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 53.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbt_ci-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 62da3f3caf75825d69f7af410d4f2bdd89cadc1712886ae61aea5ed0f72ad1c1
MD5 2bb757a4ff9d4216141bf51d262f5761
BLAKE2b-256 8cd4dbe920ceb317d9fa548707f534242465ec06eebb7f8dd02bb73fda296781

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page