Skip to main content

A client library for running dbt projects on Google Cloud Run

Project description

dbt-cloud-run-runner

A Python client library for running dbt projects on Google Cloud Run.

Installation

pip install dbt-cloud-run-runner

Usage

from dbt_cloud_run_runner import Client

# Service account for GCS and Cloud Run operations
# This service account needs:
# - Storage Admin on the GCS bucket
# - Cloud Run Admin in the project
gcp_service_account_key = {"type": "service_account", ...}

# Initialize the client
client = Client(
    gcp_project="your-gcp-project",
    gcs_bucket="your-gcs-bucket",
    service_account_key=gcp_service_account_key,  # Required: for GCS and Cloud Run
    region="us-central1",  # optional, defaults to us-central1
)

# Prepare a dbt project for BigQuery
# This method performs several side effects:
# 1. Generates a profiles.yml file for BigQuery
# 2. Zips your local dbt project (excluding target/ directory)
# 3. Uploads profiles.yml, the dbt project zip, and service account credentials to GCS
# 4. Generates pre-signed URLs (valid for 2 hours by default) for the Cloud Run job
#
# Returns: DbtCloudRunSetup object containing:
#   - GCS blob paths (gs://bucket/path format) for all uploaded files
#   - Pre-signed URLs for downloading inputs and uploading outputs
#   - The Docker image to use
#
# Note: The service_account_key here is different from the one passed to Client.
# This one is used for BigQuery access inside the dbt container.
setup = client.prepare_bigquery(
    service_account_key={"type": "service_account", ...},  # For BigQuery access
    target_project="your-bigquery-project",
    target_dataset="your_dataset",
    path_to_local_dbt_project="./path/to/dbt/project",
    image="us-docker.pkg.dev/delphiio-prod/public-images/dbt-runner:v0.1.1",
    url_expiration_hours=2,  # Optional: override default 2-hour URL expiration
)

# Run the dbt project on Cloud Run
execution_id = client.run(setup)
print(f"Execution started: {execution_id}")

# Wait for completion
status = client.wait_for_completion(execution_id)
print(f"Execution finished with state: {status.state.value}")

# Or poll status manually
status = client.get_status(execution_id)
print(f"Current state: {status.state.value}")

Storage backends (GCS and S3)

Cloud Run job execution always runs on GCP, but the object storage used for run inputs, output, and logs is selectable via storage_backend:

from dbt_cloud_run_runner import Client, StorageBackend

# Google Cloud Storage (default) — unchanged behavior
client = Client(
    gcp_project="your-gcp-project",
    bucket="your-gcs-bucket",
    service_account_key=gcp_service_account_key,
)

# Amazon S3 — inputs are uploaded to S3 and the container's download/upload URLs
# are S3 presigned URLs. Cloud Run still runs on GCP via service_account_key.
client = Client(
    gcp_project="your-gcp-project",
    bucket="your-s3-bucket",
    service_account_key=gcp_service_account_key,  # for Cloud Run job management
    storage_backend=StorageBackend.s3,
    s3_region="us-east-1",
    # AWS credentials are optional: when omitted, the default boto3 provider
    # chain (e.g. the task/runtime IAM role) is used — no key file required.
    aws_access_key_id="...",
    aws_secret_access_key="...",
)

The DbtCloudRunSetup blob URIs use the backend's scheme (gs:// for GCS, s3:// for S3); the dbt-runs/{run_id}/... object-key layout is identical across backends. The runner container is backend-agnostic: it GETs inputs and PUTs output/logs against the presigned URLs regardless of backend.

prepare_bigquery() Method

The prepare_bigquery() method prepares your dbt project for execution on Cloud Run. It performs several operations with side effects:

Side Effects

  1. Generates profiles.yml: Creates a BigQuery profile configuration file that will be used by dbt inside the container. By default the active dbt target is named dev; pass target_name to prepare_bigquery() to use a different outputs key (for example prod).

  2. Zips the dbt project: Packages your local dbt project directory into a zip file, automatically excluding the target/ directory (which contains compiled artifacts).

  3. Uploads to GCS: Uploads three files to Google Cloud Storage:

    • profiles.yml - The dbt profile configuration
    • dbt_project.zip - Your zipped dbt project
    • credentials.json - Your service account key (for BigQuery authentication)
  4. Generates pre-signed URLs: Creates time-limited signed URLs (default: 2 hours) that allow the Cloud Run container to:

    • Download the dbt project and profiles.yml
    • Upload the compiled output (target/ directory) and logs

Return Value

Returns a DbtCloudRunSetup object containing:

  • Blob paths (in gs://bucket/path format):

    • profiles_yml_blob - Location of the uploaded profiles.yml
    • dbt_project_blob - Location of the uploaded dbt project zip
    • credentials_blob - Location of the uploaded service account key
    • output_blob - Where the compiled dbt output will be stored
    • logs_blob - Where the execution logs will be stored
  • Pre-signed URLs:

    • profiles_yml_url - URL to download profiles.yml
    • dbt_project_url - URL to download the dbt project zip
    • credentials_url - URL to download the service account key
    • output_url - URL to upload the compiled output (PUT request)
    • logs_url - URL to upload execution logs (PUT request)
  • Image: The Docker image identifier to use for the Cloud Run job

Important Notes

  • URL Expiration: Pre-signed URLs expire after 2 hours by default (configurable via url_expiration_hours parameter). Make sure to call client.run(setup) before the URLs expire.

  • GCS Storage: Files are uploaded to gs://{bucket}/dbt-runs/{run_id}/ where run_id is a unique identifier generated for each call to prepare_bigquery().

  • Idempotency: Each call to prepare_bigquery() creates a new run with a unique ID, so you can safely call it multiple times without conflicts.

Features

  • Automatic GCS setup: Uploads your dbt project and credentials to GCS with signed URLs
  • Cloud Run job management: Creates and manages Cloud Run jobs automatically
  • BigQuery integration: Generates profiles.yml for BigQuery targets
  • Status monitoring: Track execution status with polling or wait for completion

Requirements

  • Python 3.9+
  • Google Cloud project with Cloud Run and GCS enabled
  • Two service accounts (can be the same, but often different):
    1. GCS/Cloud Run service account (passed to Client()):
      • Cloud Run Admin (roles/run.admin) in the project
      • Storage Admin (roles/storage.admin) on the GCS bucket
      • Must have a private key (for signing URLs)
    2. BigQuery service account (passed to prepare_bigquery()):
      • BigQuery access for the target project/dataset
      • This is the account that dbt will use to query BigQuery

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_cloud_run_runner-0.11.0.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_cloud_run_runner-0.11.0-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file dbt_cloud_run_runner-0.11.0.tar.gz.

File metadata

  • Download URL: dbt_cloud_run_runner-0.11.0.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for dbt_cloud_run_runner-0.11.0.tar.gz
Algorithm Hash digest
SHA256 c404c9d8146a5bb63c60fd10068577621120f308dc23c9413e9445974115a25c
MD5 465597dad7a645845b13200576f3cde0
BLAKE2b-256 9d7ee58cde69d70efd115d0de84701843af2547d6c9591a0823d7cde63979dbe

See more details on using hashes here.

File details

Details for the file dbt_cloud_run_runner-0.11.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt_cloud_run_runner-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 328f1cada9e0d730229570f38da90e36e24614fa1324a9f775badc1c5ea8c429
MD5 e0b0dbab86cdadb0fd8f5a6c9b11055c
BLAKE2b-256 defff1776db1b53be21e4fb089d6d2b1f476fad728c754b5928a29677d4b3674

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page