Skip to main content

A client library for running dbt projects on Google Cloud Run

Project description

dbt-cloud-run-runner

A Python client library for running dbt projects on Google Cloud Run.

Installation

pip install dbt-cloud-run-runner

Usage

from dbt_cloud_run_runner import Client

# Service account for GCS and Cloud Run operations
# This service account needs:
# - Storage Admin on the GCS bucket
# - Cloud Run Admin in the project
gcp_service_account_key = {"type": "service_account", ...}

# Initialize the client
client = Client(
    gcp_project="your-gcp-project",
    gcs_bucket="your-gcs-bucket",
    service_account_key=gcp_service_account_key,  # Required: for GCS and Cloud Run
    region="us-central1",  # optional, defaults to us-central1
)

# Prepare a dbt project for BigQuery
# This method performs several side effects:
# 1. Generates a profiles.yml file for BigQuery
# 2. Zips your local dbt project (excluding target/ directory)
# 3. Uploads profiles.yml, the dbt project zip, and service account credentials to GCS
# 4. Generates pre-signed URLs (valid for 2 hours by default) for the Cloud Run job
#
# Returns: DbtCloudRunSetup object containing:
#   - GCS blob paths (gs://bucket/path format) for all uploaded files
#   - Pre-signed URLs for downloading inputs and uploading outputs
#   - The Docker image to use
#
# Note: The service_account_key here is different from the one passed to Client.
# This one is used for BigQuery access inside the dbt container.
setup = client.prepare_bigquery(
    service_account_key={"type": "service_account", ...},  # For BigQuery access
    target_project="your-bigquery-project",
    target_dataset="your_dataset",
    path_to_local_dbt_project="./path/to/dbt/project",
    image="us-docker.pkg.dev/delphiio-prod/public-images/dbt-runner:v0.1.1",
    url_expiration_hours=2,  # Optional: override default 2-hour URL expiration
)

# Run the dbt project on Cloud Run
execution_id = client.run(setup)
print(f"Execution started: {execution_id}")

# Wait for completion
status = client.wait_for_completion(execution_id)
print(f"Execution finished with state: {status.state.value}")

# Or poll status manually
status = client.get_status(execution_id)
print(f"Current state: {status.state.value}")

prepare_bigquery() Method

The prepare_bigquery() method prepares your dbt project for execution on Cloud Run. It performs several operations with side effects:

Side Effects

  1. Generates profiles.yml: Creates a BigQuery profile configuration file that will be used by dbt inside the container.

  2. Zips the dbt project: Packages your local dbt project directory into a zip file, automatically excluding the target/ directory (which contains compiled artifacts).

  3. Uploads to GCS: Uploads three files to Google Cloud Storage:

    • profiles.yml - The dbt profile configuration
    • dbt_project.zip - Your zipped dbt project
    • credentials.json - Your service account key (for BigQuery authentication)
  4. Generates pre-signed URLs: Creates time-limited signed URLs (default: 2 hours) that allow the Cloud Run container to:

    • Download the dbt project and profiles.yml
    • Upload the compiled output (target/ directory) and logs

Return Value

Returns a DbtCloudRunSetup object containing:

  • Blob paths (in gs://bucket/path format):

    • profiles_yml_blob - Location of the uploaded profiles.yml
    • dbt_project_blob - Location of the uploaded dbt project zip
    • credentials_blob - Location of the uploaded service account key
    • output_blob - Where the compiled dbt output will be stored
    • logs_blob - Where the execution logs will be stored
  • Pre-signed URLs:

    • profiles_yml_url - URL to download profiles.yml
    • dbt_project_url - URL to download the dbt project zip
    • credentials_url - URL to download the service account key
    • output_url - URL to upload the compiled output (PUT request)
    • logs_url - URL to upload execution logs (PUT request)
  • Image: The Docker image identifier to use for the Cloud Run job

Important Notes

  • URL Expiration: Pre-signed URLs expire after 2 hours by default (configurable via url_expiration_hours parameter). Make sure to call client.run(setup) before the URLs expire.

  • GCS Storage: Files are uploaded to gs://{bucket}/dbt-runs/{run_id}/ where run_id is a unique identifier generated for each call to prepare_bigquery().

  • Idempotency: Each call to prepare_bigquery() creates a new run with a unique ID, so you can safely call it multiple times without conflicts.

Features

  • Automatic GCS setup: Uploads your dbt project and credentials to GCS with signed URLs
  • Cloud Run job management: Creates and manages Cloud Run jobs automatically
  • BigQuery integration: Generates profiles.yml for BigQuery targets
  • Status monitoring: Track execution status with polling or wait for completion

Requirements

  • Python 3.9+
  • Google Cloud project with Cloud Run and GCS enabled
  • Two service accounts (can be the same, but often different):
    1. GCS/Cloud Run service account (passed to Client()):
      • Cloud Run Admin (roles/run.admin) in the project
      • Storage Admin (roles/storage.admin) on the GCS bucket
      • Must have a private key (for signing URLs)
    2. BigQuery service account (passed to prepare_bigquery()):
      • BigQuery access for the target project/dataset
      • This is the account that dbt will use to query BigQuery

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_cloud_run_runner-0.6.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_cloud_run_runner-0.6.0-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file dbt_cloud_run_runner-0.6.0.tar.gz.

File metadata

  • Download URL: dbt_cloud_run_runner-0.6.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for dbt_cloud_run_runner-0.6.0.tar.gz
Algorithm Hash digest
SHA256 84ed7b3ad191377aac62e016e77f6062eb4ff8e1d3dc0f6185690cfcf74bc09c
MD5 760fe0a17bdee3b0311c4cb07bf7e982
BLAKE2b-256 b79fb9180be3ac8b85bae4fe557bffa6ea1941cc046118c7886a35349997eca4

See more details on using hashes here.

File details

Details for the file dbt_cloud_run_runner-0.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt_cloud_run_runner-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 283b8199dbcd5e11d1d0e7fe7cf2030471132775d6aace1ae851fc840ed7bfe7
MD5 d58e3656a386527399cc42f3c6928af4
BLAKE2b-256 5521c3289f0d8d2121cbbb0403818e1a0f23f6520fe75161fb58aaf2d03a7e4d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page