CI/CD tool for dbt projects with intelligent change detection and selective execution
Project description
dbt-ci
A CI tool for dbt (data build tool) projects that intelligently runs only modified models based on state comparison, supporting multiple execution environments including local, Docker, and dbt runners.
How It Works
dbt-ci uses a cache-based workflow:
init- Downloads reference state from cloud storage (or uses local), compares with current code, and creates a cache of changesrun/delete/ephemeral- Use the cached state automatically (no need to re-specify state paths)
This design ensures:
- ✅ Consistent state across all commands in a CI run
- ✅ Better performance (no redundant state downloads)
- ✅ Simpler CLI (specify state once in init, reuse everywhere)
Installation
From PyPI (Recommended)
pip install dbt-ci
From GitHub
# Install from main branch
pip install git+https://github.com/datablock-dev/dbt-ci.git@main
# Install a specific version
pip install git+https://github.com/datablock-dev/dbt-ci.git@v1.0.0
Local Development
git clone https://github.com/datablock-dev/dbt-ci.git
cd dbt-ci
pip install -e ".[dev]"
After installation, the tool is available as dbt-ci.
Quick Start
The Workflow: Initialize once with init, then run commands that use the cached state.
1. Initialize State
First, initialize the dbt-ci state. This downloads/reads reference state and creates a cache:
dbt-ci init \
--dbt-project-dir dbt \
--profiles-dir dbt \
--reference-target production \
--state dbt/.dbtstate
With Cloud Storage (GCS/S3):
dbt-ci init \
--dbt-project-dir dbt \
--state-uri gs://my-bucket/dbt-state/manifest.json \
--reference-target production \
--state dbt/.dbtstate
2. Run Modified Models
After initialization, run commands use the cached state automatically:
# No need to specify --state again!
dbt-ci run \
--dbt-project-dir dbt \
--profiles-dir dbt
With Docker:
dbt-ci run \
--runner docker \
--docker-image ghcr.io/dbt-labs/dbt-bigquery:latest
Commands
init - Initialize State
Creates initial state from your dbt project. Always run this first. Downloads reference manifest from cloud storage (if specified) and creates a local cache for subsequent commands.
dbt-ci init \
--dbt-project-dir dbt \
--profiles-dir dbt \
--state-uri gs://my-bucket/manifest.json \
--reference-target production \
--state dbt/.dbtstate
Options:
--state,--reference-state: Local path where state will be downloaded/stored--state-uri: Remote URI for state manifest (e.g.,gs://bucket/manifest.json,s3://bucket/manifest.json)--reference-target: Target to use for production/reference manifest (optional)--dbt-version: Specific dbt version to use (e.g.,1.10.13)--adapter,-a: Adapter to install (e.g.,dbt-duckdb=1.10.0)
run - Run Modified Models
Detects and runs models that have changed. Uses cached state from init command.
# Run after init - uses cached state
dbt-ci run \
--dbt-project-dir dbt \
--mode models
Options:
--mode,-m: What to run:all,models,seeds,snapshots,tests(default:all)--defer: Use dbt's defer flag for production state
Examples:
# Run only modified models
dbt-ci run --mode models
# Run modified models with defer to production
dbt-ci run --mode models --defer
# Run all modified resources (models, tests, seeds, etc.)
dbt-ci run --mode all
# With Docker
dbt-ci run --runner docker --mode models
ephemeral - Ephemeral Environment
Creates ephemeral environments for testing without affecting production. Uses cached state from init.
# Run after init
dbt-ci ephemeral --dbt-project-dir dbt
Options:
--keep-env: Don't destroy ephemeral environment after run
delete - Delete Removed Models
Detects and deletes models that have been removed from the project. Uses cached state from init.
# Run after init
dbt-ci delete --dbt-project-dir dbt
Runners
dbt-ci supports multiple execution environments:
Local Runner
Execute dbt commands directly on your machine:
# After init
dbt-ci run \
--runner local \
--dbt-project-dir dbt
dbt Runner (Python API)
Uses dbt's Python API (fastest, default):
# After init - uses dbt Python API
dbt-ci run \
--runner dbt \
--dbt-project-dir dbt
Docker Runner
Run dbt commands inside a Docker container:
dbt-ci run \
--runner docker \
--docker-image ghcr.io/dbt-labs/dbt-duckdb:latest \
--docker-volumes $(pwd):/workspace \
--dbt-project-dir /workspace/dbt \
--state /workspace/dbt/.dbtstate
For Apple Silicon Macs:
dbt-ci run \
--runner docker \
--docker-platform linux/amd64 \
--docker-image ghcr.io/dbt-labs/dbt-postgres:latest \
--docker-volumes $(pwd):/workspace \
--dbt-project-dir /workspace/dbt
Docker Advanced Options
Platform (for Apple Silicon compatibility):
--docker-platform linux/amd64 # or linux/arm64
Custom Volumes:
--docker-volumes "/host/path:/container/path" --docker-volumes "/another:/path:ro"
Environment Variables:
--docker-env "DBT_ENV=prod" --docker-env "MY_API_KEY=secret"
Network Mode:
--docker-network bridge # or host, none, container:name
User:
--docker-user "1000:1000" # or leave empty for auto-detect
Additional Docker Args:
--docker-args "--memory=2g --cpus=2"
Complete Docker Example:
dbt-ci run \
--runner docker \
--docker-image ghcr.io/dbt-labs/dbt-postgres:1.7.0 \
--docker-platform linux/amd64 \
--docker-env "POSTGRES_HOST=host.docker.internal" \
--docker-network host \
--docker-volumes "$(pwd):/workspace" \
--docker-volumes "$HOME/.aws:/root/.aws:ro" \
--dbt-project-dir /workspace/dbt \
--profiles-dir /workspace/dbt \
--target prod
Global Options
These options apply to all commands:
| Option | Description | Default |
|---|---|---|
--dbt-project-dir |
Path to dbt project directory | . |
--profiles-dir |
Path to profiles.yml directory | Auto-detect |
--reference-target |
dbt target for production/reference manifest (init only) | None |
--target, -t |
dbt target to use | From profiles.yml |
--vars, -v |
YAML string or file path with dbt variables | "" |
--defer |
Use dbt's defer flag for production state | false |
--runner, -r |
Runner type: local, docker, bash, dbt |
dbt |
--entrypoint |
Command entrypoint for dbt | dbt |
--dbt-version |
Specific dbt version to use | Current |
--adapter, -a |
Adapter to install (format: dbt-adapter=version) |
None |
--dry-run |
Print commands without executing | false |
--log-level |
Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL | INFO |
--slack-webhook |
Slack webhook URL for notifications | None |
Init-Specific Options
These options are only available for the init command:
| Option | Description | Default |
|---|---|---|
--state, --reference-state |
Local path where reference state will be stored | None |
--state-uri |
Remote URI for state manifest (e.g., gs://bucket/manifest.json, s3://bucket/manifest.json) |
None |
Docker Options
| Option | Description | Default |
|---|---|---|
--docker-image |
Docker image for dbt | ghcr.io/dbt-labs/dbt-core:latest |
--docker-platform |
Platform (linux/amd64, linux/arm64) | Auto-detect |
--docker-volumes |
Volume mounts (format: host:container[:mode]) |
[] |
--docker-env |
Environment variables (format: KEY=VALUE) |
[] |
--docker-network |
Docker network mode | host |
--docker-user |
User to run as (UID:GID) | Auto-detect |
--docker-args |
Additional docker run arguments | "" |
Bash Runner Options
| Option | Description | Default |
|---|---|---|
--shell-path, --bash-path |
Path to shell executable | /bin/bash |
Cloud Storage Support
dbt-ci supports storing and retrieving state files from cloud storage (GCS, S3), making it ideal for distributed CI/CD workflows.
GCS/S3 State Storage
Store your dbt reference state in cloud storage for shared access across CI runs:
# Initialize and download state from GCS
dbt-ci init \
--dbt-project-dir dbt \
--state-uri gs://my-bucket/dbt-state/manifest.json \
--reference-target production \
--state dbt/.dbtstate
# Run using cached state (no need to specify URI again)
dbt-ci run --dbt-project-dir dbt --mode models
Benefits:
- 🔄 Shared State: Download the same reference state across different CI jobs
- 💾 Cache-Based: After init, commands use local cache (no repeated downloads)
- 📦 No Git Commits: State files don't need to be committed to version control
- 🚀 Scalable: Works seamlessly in containerized and distributed environments
- 🔐 Secure: Leverage cloud IAM and bucket policies for access control
Configuration:
The tool uses cloud credentials from your environment. Ensure your bucket is accessible:
# For GCS
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# For AWS S3
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_DEFAULT_REGION=us-east-1
# Or use IAM roles (recommended in CI/CD)
dbt-ci init --state-uri gs://my-bucket/manifest.json
Supported URI Formats:
gs://bucket-name/path/to/manifest.json(Google Cloud Storage)s3://bucket-name/path/to/manifest.json(AWS S3)
Environment Variables
All CLI options can also be set via environment variables:
export DBT_PROJECT_DIR=./dbt
export DBT_PROFILES_DIR=./dbt
export DBT_TARGET=production
export DBT_RUNNER=local
# After running init, just use:
dbt-ci run
Common Environment Variables:
DBT_PROJECT_DIR- Path to dbt projectDBT_PROFILES_DIR- Path to profiles.yml locationDBT_TARGET- Target environment to useDBT_RUNNER- Runner type (local, docker, bash, dbt)
Note: State management is cache-based. Run init once, then subsequent commands automatically use the cached state.
CI/CD Integration
GitHub Actions Example
name: dbt CI
on: [pull_request]
jobs:
dbt-ci:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
aws-region: us-east-1
- name: Install dbt-ci
run: pip install git+https://github.com/datablock-dev/dbt-ci.git@main
- name: Initialize dbt-ci with cloud state
run: |
dbt-ci init \
--dbt-project-dir dbt \
--state-uri gs://my-dbt-state/prod/manifest.json \
--reference-target production \
--state dbt/.dbtstate
- name: Run modified models
run: |
dbt-ci run --mode models
GitLab CI Example
dbt-ci:
image: python:3.11
script:
- pip install git+https://github.com/datablock-dev/dbt-ci.git@main
- dbt-ci init --dbt-project-dir dbt --state-uri gs://my-dbt-state/prod/manifest.json --reference-target production --state dbt/.dbtstate
- dbt-ci run --mode models
only:
- merge_requests
Features
- 🎯 Smart Detection: Automatically identifies modified, new, and deleted models
- 📊 Dependency Tracking: Generates and traverses dependency graphs for lineage analysis
- 🔄 State Comparison: Compares current state against production for precise CI
- ☁️ Cloud Storage: S3 integration for shared state across distributed CI/CD workflows
- 🚀 Multiple Runners: Supports local, Docker, bash, and dbt Python API execution
- 🐳 Docker-First: Extensive Docker configuration for containerized workflows
- ⚡ Selective Execution: Run only what changed, saving time and resources
- 🔌 Adapter Support: Install specific dbt versions and adapters on-demand
- 💬 Notifications: Slack webhook integration for CI/CD alerts
- ♻️ Ephemeral Environments: Test changes in isolated environments
- 🧹 Cleanup: Automatically remove deleted models from target warehouse
Use Cases
Pull Request CI
Only build and test models affected by PR changes:
# Initialize with reference state
dbt-ci init --state-uri gs://bucket/manifest.json --reference-target production --state dbt/.dbtstate
# Run modified models with defer
dbt-ci run --mode models --defer
Distributed CI with Cloud Storage
Share state across multiple CI jobs:
# Job 1: Initialize state (downloads from cloud)
dbt-ci init --state-uri gs://my-bucket/manifest.json --reference-target production --state dbt/.dbtstate
# Job 2: Run models (uses cached state)
dbt-ci run --mode models
# Job 3: Run tests (uses cached state)
dbt-ci run --mode tests
Selective Testing
Run tests only for modified models:
# After init
dbt-ci run --mode tests
Schema Migrations
Clean up deleted models from production:
# After init
dbt-ci delete --target production
Multi-Environment Testing
Create ephemeral test environments:
dbt-ci ephemeral --keep-env
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Development Setup
- Clone the repository
- Install dependencies:
pip install -e ".[dev]" - Run tests:
pytest tests/ - Run linting:
black src/ tests/
Commit Message Format
This project uses Conventional Commits for automated releases:
feat:New feature (minor version bump)fix:Bug fix (patch version bump)docs:Documentation changesrefactor:Code refactoringtest:Adding testschore:Maintenance tasks
Example:
git commit -m "feat: add Docker runner support"
git commit -m "fix: resolve path resolution on Windows"
See RELEASING.md for details on the automated release process.
License
See LICENSE file for details.
Links
- PyPI: https://pypi.org/project/dbt-ci/
- Documentation: https://datablock.dev
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Changelog: CHANGELOG.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_ci-1.2.2.tar.gz.
File metadata
- Download URL: dbt_ci-1.2.2.tar.gz
- Upload date:
- Size: 122.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e826d19eb74fd1a07d2e7cd42d0c9283e905021bb30ef5727ad1b9385fab89e1
|
|
| MD5 |
0eb46c600b534f7b7bcc78c854eb938b
|
|
| BLAKE2b-256 |
7a663b540d90d55a4e0dfafb8854d6bdd4ba8190abfc8b5e2bfc7c5c0481ad33
|
File details
Details for the file dbt_ci-1.2.2-py3-none-any.whl.
File metadata
- Download URL: dbt_ci-1.2.2-py3-none-any.whl
- Upload date:
- Size: 54.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3492b7ccd77f0ad089202894f60d20dc85ba6ccf24751a0a994dc0f3661f5ee8
|
|
| MD5 |
b7ca25713bd021e0da3ce555a0563d58
|
|
| BLAKE2b-256 |
24f1a8bc481c355ebcf227e6bb7493e43d6e1cdb013bd8072c0df37e0ac3928f
|