Skip to main content

A client library for running dbt projects on Google Cloud Run

Project description

dbt-cloud-run-runner

A Python client library for running dbt projects on Google Cloud Run.

Installation

pip install dbt-cloud-run-runner

Usage

from dbt_cloud_run_runner import Client

# Service account for GCS and Cloud Run operations
# This service account needs:
# - Storage Admin on the GCS bucket
# - Cloud Run Admin in the project
gcp_service_account_key = {"type": "service_account", ...}

# Initialize the client
client = Client(
    gcp_project="your-gcp-project",
    gcs_bucket="your-gcs-bucket",
    service_account_key=gcp_service_account_key,  # Required: for GCS and Cloud Run
    region="us-central1",  # optional, defaults to us-central1
)

# Prepare a dbt project for BigQuery
# This method performs several side effects:
# 1. Generates a profiles.yml file for BigQuery
# 2. Zips your local dbt project (excluding target/ directory)
# 3. Uploads profiles.yml, the dbt project zip, and service account credentials to GCS
# 4. Generates pre-signed URLs (valid for 2 hours by default) for the Cloud Run job
#
# Returns: DbtCloudRunSetup object containing:
#   - GCS blob paths (gs://bucket/path format) for all uploaded files
#   - Pre-signed URLs for downloading inputs and uploading outputs
#   - The Docker image to use
#
# Note: The service_account_key here is different from the one passed to Client.
# This one is used for BigQuery access inside the dbt container.
setup = client.prepare_bigquery(
    service_account_key={"type": "service_account", ...},  # For BigQuery access
    target_project="your-bigquery-project",
    target_dataset="your_dataset",
    path_to_local_dbt_project="./path/to/dbt/project",
    image="us-docker.pkg.dev/delphiio-prod/public-images/dbt-runner:v0.1.1",
    url_expiration_hours=2,  # Optional: override default 2-hour URL expiration
    dbt_compile_flags=["--select", "staging"],
    dbt_docs_flags=["--empty-catalog"],
)

# Run the dbt project on Cloud Run
execution_id = client.run(setup)
print(f"Execution started: {execution_id}")

# Wait for completion
status = client.wait_for_completion(execution_id)
print(f"Execution finished with state: {status.state.value}")

# Or poll status manually
status = client.get_status(execution_id)
print(f"Current state: {status.state.value}")

prepare_bigquery() Method

The prepare_bigquery() method prepares your dbt project for execution on Cloud Run. It performs several operations with side effects:

Side Effects

  1. Generates profiles.yml: Creates a BigQuery profile configuration file that will be used by dbt inside the container.

  2. Zips the dbt project: Packages your local dbt project directory into a zip file, automatically excluding the target/ directory (which contains compiled artifacts).

  3. Uploads to GCS: Uploads three files to Google Cloud Storage:

    • profiles.yml - The dbt profile configuration
    • dbt_project.zip - Your zipped dbt project
    • credentials.json - Your service account key (for BigQuery authentication)
  4. Generates pre-signed URLs: Creates time-limited signed URLs (default: 2 hours) that allow the Cloud Run container to:

    • Download the dbt project and profiles.yml
    • Upload the compiled output (target/ directory) and logs

Return Value

Returns a DbtCloudRunSetup object containing:

  • Blob paths (in gs://bucket/path format):

    • profiles_yml_blob - Location of the uploaded profiles.yml
    • dbt_project_blob - Location of the uploaded dbt project zip
    • credentials_blob - Location of the uploaded service account key
    • output_blob - Where the compiled dbt output will be stored
    • logs_blob - Where the execution logs will be stored
  • Pre-signed URLs:

    • profiles_yml_url - URL to download profiles.yml
    • dbt_project_url - URL to download the dbt project zip
    • credentials_url - URL to download the service account key
    • output_url - URL to upload the compiled output (PUT request)
    • logs_url - URL to upload execution logs (PUT request)
  • Image: The Docker image identifier to use for the Cloud Run job

Important Notes

  • URL Expiration: Pre-signed URLs expire after 2 hours by default (configurable via url_expiration_hours parameter). Make sure to call client.run(setup) before the URLs expire.

  • GCS Storage: Files are uploaded to gs://{bucket}/dbt-runs/{run_id}/ where run_id is a unique identifier generated for each call to prepare_bigquery().

  • Idempotency: Each call to prepare_bigquery() creates a new run with a unique ID, so you can safely call it multiple times without conflicts.

  • dbt command flags:

    • Use dbt_compile_flags to pass additional arguments to dbt compile.
    • Use dbt_docs_flags to pass additional arguments to dbt docs generate.
    • Both parameters accept list[str], where each item is passed as one argument.

Features

  • Automatic GCS setup: Uploads your dbt project and credentials to GCS with signed URLs
  • Cloud Run job management: Creates and manages Cloud Run jobs automatically
  • BigQuery integration: Generates profiles.yml for BigQuery targets
  • Status monitoring: Track execution status with polling or wait for completion

Requirements

  • Python 3.9+
  • Google Cloud project with Cloud Run and GCS enabled
  • Two service accounts (can be the same, but often different):
    1. GCS/Cloud Run service account (passed to Client()):
      • Cloud Run Admin (roles/run.admin) in the project
      • Storage Admin (roles/storage.admin) on the GCS bucket
      • Must have a private key (for signing URLs)
    2. BigQuery service account (passed to prepare_bigquery()):
      • BigQuery access for the target project/dataset
      • This is the account that dbt will use to query BigQuery

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_cloud_run_runner-0.9.0.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_cloud_run_runner-0.9.0-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file dbt_cloud_run_runner-0.9.0.tar.gz.

File metadata

  • Download URL: dbt_cloud_run_runner-0.9.0.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for dbt_cloud_run_runner-0.9.0.tar.gz
Algorithm Hash digest
SHA256 6194a9bbaabb4d85bd48c6096861fdf42e1d260de43aad275c44bd213f9ddb77
MD5 a092622d81d92b0bbaaaac1e1581968f
BLAKE2b-256 9fc57edfd119f871f3b2bcbaa9ab317c5873efe8b786e35ec150f124a2fa4a7e

See more details on using hashes here.

File details

Details for the file dbt_cloud_run_runner-0.9.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt_cloud_run_runner-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 72d9597192dd1e907cc92c12bfffbb2e9c9da873a2f0a06526c1a76c42c70074
MD5 03cb1d0b163c5c2f655d5322d8fcbc22
BLAKE2b-256 4a855f5831f1e39ae161efe54885e08cb03515b823818058c9fafacad1962ce0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page