Skip to main content

A client library for running dbt projects on Google Cloud Run

Project description

dbt-cloud-run-runner

A Python client library for running dbt projects on Google Cloud Run.

Installation

pip install dbt-cloud-run-runner

Usage

from dbt_cloud_run_runner import Client

# Service account for GCS and Cloud Run operations
# This service account needs:
# - Storage Admin on the GCS bucket
# - Cloud Run Admin in the project
gcp_service_account_key = {"type": "service_account", ...}

# Initialize the client
client = Client(
    gcp_project="your-gcp-project",
    gcs_bucket="your-gcs-bucket",
    service_account_key=gcp_service_account_key,  # Required: for GCS and Cloud Run
    region="us-central1",  # optional, defaults to us-central1
)

# Prepare a dbt project for BigQuery
# This method performs several side effects:
# 1. Generates a profiles.yml file for BigQuery
# 2. Zips your local dbt project (excluding target/ directory)
# 3. Uploads profiles.yml, the dbt project zip, and service account credentials to GCS
# 4. Generates pre-signed URLs (valid for 2 hours by default) for the Cloud Run job
#
# Returns: DbtCloudRunSetup object containing:
#   - GCS blob paths (gs://bucket/path format) for all uploaded files
#   - Pre-signed URLs for downloading inputs and uploading outputs
#   - The Docker image to use
#
# Note: The service_account_key here is different from the one passed to Client.
# This one is used for BigQuery access inside the dbt container.
setup = client.prepare_bigquery(
    service_account_key={"type": "service_account", ...},  # For BigQuery access
    target_project="your-bigquery-project",
    target_dataset="your_dataset",
    path_to_local_dbt_project="./path/to/dbt/project",
    image="us-docker.pkg.dev/delphiio-prod/public-images/dbt-runner:v0.1.1",
    url_expiration_hours=2,  # Optional: override default 2-hour URL expiration
)

# Run the dbt project on Cloud Run
execution_id = client.run(setup)
print(f"Execution started: {execution_id}")

# Wait for completion
status = client.wait_for_completion(execution_id)
print(f"Execution finished with state: {status.state.value}")

# Or poll status manually
status = client.get_status(execution_id)
print(f"Current state: {status.state.value}")

prepare_bigquery() Method

The prepare_bigquery() method prepares your dbt project for execution on Cloud Run. It performs several operations with side effects:

Side Effects

  1. Generates profiles.yml: Creates a BigQuery profile configuration file that will be used by dbt inside the container. By default the active dbt target is named dev; pass target_name to prepare_bigquery() to use a different outputs key (for example prod).

  2. Zips the dbt project: Packages your local dbt project directory into a zip file, automatically excluding the target/ directory (which contains compiled artifacts).

  3. Uploads to GCS: Uploads three files to Google Cloud Storage:

    • profiles.yml - The dbt profile configuration
    • dbt_project.zip - Your zipped dbt project
    • credentials.json - Your service account key (for BigQuery authentication)
  4. Generates pre-signed URLs: Creates time-limited signed URLs (default: 2 hours) that allow the Cloud Run container to:

    • Download the dbt project and profiles.yml
    • Upload the compiled output (target/ directory) and logs

Return Value

Returns a DbtCloudRunSetup object containing:

  • Blob paths (in gs://bucket/path format):

    • profiles_yml_blob - Location of the uploaded profiles.yml
    • dbt_project_blob - Location of the uploaded dbt project zip
    • credentials_blob - Location of the uploaded service account key
    • output_blob - Where the compiled dbt output will be stored
    • logs_blob - Where the execution logs will be stored
  • Pre-signed URLs:

    • profiles_yml_url - URL to download profiles.yml
    • dbt_project_url - URL to download the dbt project zip
    • credentials_url - URL to download the service account key
    • output_url - URL to upload the compiled output (PUT request)
    • logs_url - URL to upload execution logs (PUT request)
  • Image: The Docker image identifier to use for the Cloud Run job

Important Notes

  • URL Expiration: Pre-signed URLs expire after 2 hours by default (configurable via url_expiration_hours parameter). Make sure to call client.run(setup) before the URLs expire.

  • GCS Storage: Files are uploaded to gs://{bucket}/dbt-runs/{run_id}/ where run_id is a unique identifier generated for each call to prepare_bigquery().

  • Idempotency: Each call to prepare_bigquery() creates a new run with a unique ID, so you can safely call it multiple times without conflicts.

Features

  • Automatic GCS setup: Uploads your dbt project and credentials to GCS with signed URLs
  • Cloud Run job management: Creates and manages Cloud Run jobs automatically
  • BigQuery integration: Generates profiles.yml for BigQuery targets
  • Status monitoring: Track execution status with polling or wait for completion

Requirements

  • Python 3.9+
  • Google Cloud project with Cloud Run and GCS enabled
  • Two service accounts (can be the same, but often different):
    1. GCS/Cloud Run service account (passed to Client()):
      • Cloud Run Admin (roles/run.admin) in the project
      • Storage Admin (roles/storage.admin) on the GCS bucket
      • Must have a private key (for signing URLs)
    2. BigQuery service account (passed to prepare_bigquery()):
      • BigQuery access for the target project/dataset
      • This is the account that dbt will use to query BigQuery

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_cloud_run_runner-0.10.0.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_cloud_run_runner-0.10.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file dbt_cloud_run_runner-0.10.0.tar.gz.

File metadata

  • Download URL: dbt_cloud_run_runner-0.10.0.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for dbt_cloud_run_runner-0.10.0.tar.gz
Algorithm Hash digest
SHA256 aee4a6ff3922a837fd6804b8f2f9ef2ad81b74ea71a46126293564a0cdd56254
MD5 4613a4694a55850671cca913b420c7a0
BLAKE2b-256 cb6f7deace3c9e3c15180f16cb13ee08a714a8ec4aff9bbbaf01d834a38c228c

See more details on using hashes here.

File details

Details for the file dbt_cloud_run_runner-0.10.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt_cloud_run_runner-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d91207eff032847ca86672e60ebfbd3146aa1ed8e56ec95b00ad9658582c1550
MD5 3c87d076e2e61ec8dd7e81386965e00a
BLAKE2b-256 9003036259533d2765ca70b362661f75b5c114a8a2c95bd56087b61fcc8de916

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page