Skip to main content

A comprehensive Python library for managing DBT (Data Build Tool) DAGs within the Fast.BI data development platform

Project description

Fast.BI DBT Runner

PyPI version Python 3.9+ License: MIT GitHub Actions GitHub Actions

A comprehensive Python library for managing DBT (Data Build Tool) DAGs within the Fast.BI data development platform. This package provides multiple execution operators optimized for different cost-performance trade-offs, from low-cost slow execution to high-cost fast execution.

🚀 Overview

Fast.BI DBT Runner is part of the Fast.BI Data Development Platform, designed to provide flexible and scalable DBT workload execution across various infrastructure options. The package offers four distinct operator types, each optimized for specific use cases and requirements.

🎯 Key Features

  • Multiple Execution Operators: Choose from K8S, Bash, API, or GKE operators
  • Cost-Performance Optimization: Scale from low-cost to high-performance execution
  • Airflow Integration: Seamless integration with Apache Airflow workflows
  • Manifest Parsing: Intelligent DBT manifest parsing for dynamic DAG generation
  • Airbyte Integration: Built-in support for Airbyte task group building
  • Flexible Configuration: Extensive configuration options for various deployment scenarios

📦 Installation

Basic Installation (Core Package)

pip install fast-bi-dbt-runner

With Airflow Integration

pip install fast-bi-dbt-runner[airflow]

With Development Tools

pip install fast-bi-dbt-runner[dev]

With Documentation Tools

pip install fast-bi-dbt-runner[docs]

Complete Installation

pip install fast-bi-dbt-runner[airflow,dev,docs]

🏗️ Architecture

Operator Types

The package provides four different operators for running DBT transformation pipelines:

1. K8S (Kubernetes) Operator - Default Choice

  • Best for: Cost optimization, daily/nightly jobs, high concurrency
  • Characteristics: Creates dedicated Kubernetes pods per task
  • Trade-offs: Most cost-effective but slower execution speed
  • Use cases: Daily ETL pipelines, projects with less frequent runs

2. Bash Operator

  • Best for: Balanced cost-speed ratio, medium-sized projects
  • Characteristics: Runs within Airflow worker resources
  • Trade-offs: Faster than K8S but limited by worker capacity
  • Use cases: Medium-sized projects, workflows requiring faster execution

3. API Operator

  • Best for: High performance, time-sensitive workflows
  • Characteristics: Dedicated machine per project, always-on resources
  • Trade-offs: Fastest execution but highest cost
  • Use cases: Large-scale projects, real-time analytics, high-frequency execution

4. GKE Operator

  • Best for: Complete isolation, external client workloads
  • Characteristics: Creates dedicated GKE clusters
  • Trade-offs: Full isolation but higher operational complexity
  • Use cases: External client workloads, isolated environment requirements

🚀 Quick Start

Basic Usage

from fast_bi_dbt_runner import DbtManifestParserK8sOperator

# Create a K8S operator instance
operator = DbtManifestParserK8SOperator(
    task_id='run_dbt_models',
    project_id='my-gcp-project',
    dbt_project_name='my_analytics',
    operator='k8s'
)

# Execute DBT models
operator.execute(context)

Configuration Example

# K8S Operator Configuration
k8s_config = {
    'PLATFORM': 'Airflow',
    'OPERATOR': 'k8s',
    'PROJECT_ID': 'my-gcp-project',
    'DBT_PROJECT_NAME': 'my_analytics',
    'DAG_SCHEDULE_INTERVAL': '@daily',
    'DATA_QUALITY': 'True',
    'DBT_SOURCE': 'True'
}

# API Operator Configuration
api_config = {
    'PLATFORM': 'Airflow',
    'OPERATOR': 'api',
    'PROJECT_ID': 'my-gcp-project',
    'DBT_PROJECT_NAME': 'realtime_analytics',
    'DAG_SCHEDULE_INTERVAL': '*/15 * * * *',
    'MODEL_DEBUG_LOG': 'True'
}

📚 Documentation

For detailed documentation, visit our Fast.BI Platform Documentation.

Key Documentation Sections

🔧 Configuration

Core Variables

Variable Description Default Value
PLATFORM Data orchestration platform Airflow
OPERATOR Execution operator type k8s
PROJECT_ID Google Cloud project identifier Required
DBT_PROJECT_NAME DBT project identifier Required
DAG_SCHEDULE_INTERVAL Pipeline execution schedule @once

Feature Flags

Variable Description Default
DBT_SEED Enable seed data loading False
DBT_SOURCE Enable source loading False
DBT_SNAPSHOT Enable snapshot creation False
DATA_QUALITY Enable quality service False
DEBUG Enable connection verification False

🎯 Use Cases

Daily ETL Pipeline

# Low-cost, reliable daily processing
config = {
    'OPERATOR': 'k8s',
    'DAG_SCHEDULE_INTERVAL': '@daily',
    'DBT_SOURCE': 'True',
    'DATA_QUALITY': 'True'
}

Real-time Analytics

# High-performance, frequent execution
config = {
    'OPERATOR': 'api',
    'DAG_SCHEDULE_INTERVAL': '*/15 * * * *',
    'MODEL_DEBUG_LOG': 'True'
}

External Client Workload

# Isolated, dedicated resources
config = {
    'OPERATOR': 'gke',
    'CLUSTER_NAME': 'client-isolated-cluster',
    'DATA_QUALITY': 'True'
}

🔍 Monitoring and Debugging

Enable Debug Logging

config = {
    'DEBUG': 'True',
    'MODEL_DEBUG_LOG': 'True'
}

Data Quality Integration

config = {
    'DATA_QUALITY': 'True',
    'DATAHUB_ENABLED': 'True'
}

🚀 CI/CD and Automation

This package uses GitHub Actions for continuous integration and deployment:

  • Automated Testing: Tests across Python 3.9-3.12
  • Code Quality: Linting, formatting, and type checking
  • Automated Publishing: Automatic PyPI releases on version tags
  • Documentation: Automated documentation building and deployment

Release Process

  1. Create a version tag: git tag v1.0.0
  2. Push the tag: git push origin v1.0.0
  3. GitHub Actions automatically:
    • Tests the package
    • Builds and validates
    • Publishes to PyPI
    • Creates a GitHub release

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

# Clone the repository
git clone https://github.com/fast-bi/dbt-workflow-core-runner.git
cd dbt-workflow-core-runner

# Install in development mode with all tools
pip install -e .[dev,airflow]

# Run tests
pytest

# Check code quality
flake8 fast_bi_dbt_runner/
black --check fast_bi_dbt_runner/
mypy fast_bi_dbt_runner/

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

🔗 Related Projects


Fast.BI DBT Runner - Empowering data teams with flexible, scalable DBT execution across the Fast.BI platform.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_bi_dbt_runner-2026.1.0.4.tar.gz (47.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fast_bi_dbt_runner-2026.1.0.4-py3-none-any.whl (43.5 kB view details)

Uploaded Python 3

File details

Details for the file fast_bi_dbt_runner-2026.1.0.4.tar.gz.

File metadata

  • Download URL: fast_bi_dbt_runner-2026.1.0.4.tar.gz
  • Upload date:
  • Size: 47.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for fast_bi_dbt_runner-2026.1.0.4.tar.gz
Algorithm Hash digest
SHA256 53377bcf8f33f105c07d69a010064b5f5533af985e8beb8f4cbc260296e37c90
MD5 5b1717d029325559464a35e0d4c7673a
BLAKE2b-256 5f220be1e35f7d858e1fecf2e2e7301cbe528bf04c8af56c7f895776f66fa6c1

See more details on using hashes here.

File details

Details for the file fast_bi_dbt_runner-2026.1.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for fast_bi_dbt_runner-2026.1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 4e0362a4679539b0d92b6db2845d0de7c97c29c9fa8c44c4fc925b39d144fb7f
MD5 b9faad2249035e1cda3ee8f43126565b
BLAKE2b-256 dcbdb5b13b7aa0abb0c9a37f93d72ac38b10736127416b961ef76e928e48cea7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page