Skip to main content

A client library for running dbt projects on Google Cloud Run

Project description

dbt-cloud-run-runner

A Python client library for running dbt projects on Google Cloud Run.

Installation

pip install dbt-cloud-run-runner

Usage

from dbt_cloud_run_runner import Client

# Service account for GCS and Cloud Run operations
# This service account needs:
# - Storage Admin on the GCS bucket
# - Cloud Run Admin in the project
gcp_service_account_key = {"type": "service_account", ...}

# Initialize the client
client = Client(
    gcp_project="your-gcp-project",
    gcs_bucket="your-gcs-bucket",
    service_account_key=gcp_service_account_key,  # Required: for GCS and Cloud Run
    region="us-central1",  # optional, defaults to us-central1
)

# Prepare a dbt project for BigQuery
# This method performs several side effects:
# 1. Generates a profiles.yml file for BigQuery
# 2. Zips your local dbt project (excluding target/ directory)
# 3. Uploads profiles.yml, the dbt project zip, and service account credentials to GCS
# 4. Generates pre-signed URLs (valid for 2 hours by default) for the Cloud Run job
#
# Returns: DbtCloudRunSetup object containing:
#   - GCS blob paths (gs://bucket/path format) for all uploaded files
#   - Pre-signed URLs for downloading inputs and uploading outputs
#   - The Docker image to use
#
# Note: The service_account_key here is different from the one passed to Client.
# This one is used for BigQuery access inside the dbt container.
setup = client.prepare_bigquery(
    service_account_key={"type": "service_account", ...},  # For BigQuery access
    target_project="your-bigquery-project",
    target_dataset="your_dataset",
    path_to_local_dbt_project="./path/to/dbt/project",
    image="us-docker.pkg.dev/delphiio-prod/public-images/dbt-runner:v0.1.1",
    url_expiration_hours=2,  # Optional: override default 2-hour URL expiration
)

# Run the dbt project on Cloud Run
execution_id = client.run(setup)
print(f"Execution started: {execution_id}")

# Wait for completion
status = client.wait_for_completion(execution_id)
print(f"Execution finished with state: {status.state.value}")

# Or poll status manually
status = client.get_status(execution_id)
print(f"Current state: {status.state.value}")

prepare_bigquery() Method

The prepare_bigquery() method prepares your dbt project for execution on Cloud Run. It performs several operations with side effects:

Side Effects

  1. Generates profiles.yml: Creates a BigQuery profile configuration file that will be used by dbt inside the container.

  2. Zips the dbt project: Packages your local dbt project directory into a zip file, automatically excluding the target/ directory (which contains compiled artifacts).

  3. Uploads to GCS: Uploads three files to Google Cloud Storage:

    • profiles.yml - The dbt profile configuration
    • dbt_project.zip - Your zipped dbt project
    • credentials.json - Your service account key (for BigQuery authentication)
  4. Generates pre-signed URLs: Creates time-limited signed URLs (default: 2 hours) that allow the Cloud Run container to:

    • Download the dbt project and profiles.yml
    • Upload the compiled output (target/ directory) and logs

Return Value

Returns a DbtCloudRunSetup object containing:

  • Blob paths (in gs://bucket/path format):

    • profiles_yml_blob - Location of the uploaded profiles.yml
    • dbt_project_blob - Location of the uploaded dbt project zip
    • credentials_blob - Location of the uploaded service account key
    • output_blob - Where the compiled dbt output will be stored
    • logs_blob - Where the execution logs will be stored
  • Pre-signed URLs:

    • profiles_yml_url - URL to download profiles.yml
    • dbt_project_url - URL to download the dbt project zip
    • credentials_url - URL to download the service account key
    • output_url - URL to upload the compiled output (PUT request)
    • logs_url - URL to upload execution logs (PUT request)
  • Image: The Docker image identifier to use for the Cloud Run job

Important Notes

  • URL Expiration: Pre-signed URLs expire after 2 hours by default (configurable via url_expiration_hours parameter). Make sure to call client.run(setup) before the URLs expire.

  • GCS Storage: Files are uploaded to gs://{bucket}/dbt-runs/{run_id}/ where run_id is a unique identifier generated for each call to prepare_bigquery().

  • Idempotency: Each call to prepare_bigquery() creates a new run with a unique ID, so you can safely call it multiple times without conflicts.

Features

  • Automatic GCS setup: Uploads your dbt project and credentials to GCS with signed URLs
  • Cloud Run job management: Creates and manages Cloud Run jobs automatically
  • BigQuery integration: Generates profiles.yml for BigQuery targets
  • Status monitoring: Track execution status with polling or wait for completion

Requirements

  • Python 3.9+
  • Google Cloud project with Cloud Run and GCS enabled
  • Two service accounts (can be the same, but often different):
    1. GCS/Cloud Run service account (passed to Client()):
      • Cloud Run Admin (roles/run.admin) in the project
      • Storage Admin (roles/storage.admin) on the GCS bucket
      • Must have a private key (for signing URLs)
    2. BigQuery service account (passed to prepare_bigquery()):
      • BigQuery access for the target project/dataset
      • This is the account that dbt will use to query BigQuery

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_cloud_run_runner-0.7.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_cloud_run_runner-0.7.0-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file dbt_cloud_run_runner-0.7.0.tar.gz.

File metadata

  • Download URL: dbt_cloud_run_runner-0.7.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for dbt_cloud_run_runner-0.7.0.tar.gz
Algorithm Hash digest
SHA256 aeb36c605eaf284afe5de85c9107933be154ac806851c8ad327b697fe0dc68ec
MD5 5b53c427bdbd5031dc6b5a3202d7f0f9
BLAKE2b-256 2294af5c29ef5d02e212118221c990759014d509646862fbb2e96a1d119bf74f

See more details on using hashes here.

File details

Details for the file dbt_cloud_run_runner-0.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt_cloud_run_runner-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ee47bc50ba42f3a12bedf35f33bc27ef58677b8e02294b191fa079c8daaabe06
MD5 078040341a982c1ce21bab62029f21f3
BLAKE2b-256 97c0a311e52ef7ea0fb5682fb7f50da476d84880456399a65d55ccff8280ecc5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page