Run dbt Python models locally and materialize results back to Postgres
Project description
dbt-pybridge
dbt-pybridge runs dbt Python models on Postgres by executing Python locally or in CI, then writing results back to Postgres.
It works by:
- compiling
.pymodels through dbt - executing Python locally (developer laptop or CI runner)
- loading
dbt.ref()/dbt.source()data into pandas/polars - writing the returned dataframe back into Postgres
Status
MVP scope for Python table + incremental + view materializations is implemented.
- Supported:
materialized='table' - Supported:
materialized='incremental'(strategies:append,merge,delete+insert) - Supported:
materialized='view'(implemented as a managed backing table + SQL view) - Supported DAG:
sql -> python -> sql - Supported return types: pandas DataFrame, polars DataFrame, or iterable/generator of dataframes
Install
pip install -e .
Use a supported Python version (3.11/3.12 recommended).
Profile
Set your profile type to pybridge:
my_profile:
target: dev
outputs:
dev:
type: pybridge
host: localhost
user: postgres
password: postgres
port: 5432
dbname: analytics
schema: public
threads: 1
Example model
def model(dbt, session):
df = dbt.ref("stg_orders")
df["double_amount"] = df["amount"] * 2
return df
Safer projection pattern:
def model(dbt, session):
df = dbt.ref("stg_orders").select("order_id, amount, customer_id")
return df
How to create Python models
- Create
models/<name>_python.py. - Define exactly one callable entrypoint:
def model(dbt, session): .... - Set materialization inside the function:
dbt.config(materialized="table")
- Read upstream inputs using standalone ref/source assignments (important for dbt parser):
orders = dbt.ref("stg_orders")raw_orders = dbt.source("raw", "orders")
- Return one of:
- pandas DataFrame
- polars DataFrame
- iterable/generator that yields pandas/polars DataFrames
Parser-safe pattern:
def model(dbt, session):
dbt.config(materialized="table")
orders = dbt.ref("stg_orders")
result = orders.copy()
result["double_amount"] = result["amount"] * 2
return result
Chunked mode:
def model(dbt, session):
for batch in dbt.ref("stg_orders").iter_batches(batch_size=100_000):
yield transform(batch)
Runtime logging includes progress messages such as:
[pybridge] Loading "transform"."stg_orders" (2,300,000 rows, 120.0 MB)[pybridge] Processing batch 1, rows=100000[pybridge] Writing batch 1, rows=100000
Runtime configs
Set model-level configs via dbt.config(...) in your python model:
pybridge_dataframe_backend:pandas(default) orpolarspybridge_max_rows: hard limit before failure (default1_000_000)pybridge_warn_rows: warning threshold (default200_000)pybridge_max_bytes: hard estimated table-size limit before failure (default536870912, 512MB)pybridge_warn_bytes: warning estimated table-size threshold (default134217728, 128MB)pybridge_allow_large_tables: bypass hard row limit (defaultfalse)pybridge_chunked_mode: allow oversized input only when usingiter_batches(defaultfalse)pybridge_batch_size: default batch size foriter_batches(default100_000)pybridge_column_types: optional explicit type map for created target tables, for example:{"id": "numeric(18,0)", "created_at": "timestamp", "payload": "jsonb"}
pybridge_categorical_types: optional categorical-column enum type map, for example:{"status": "status_enum", "tier": "tier_enum"}
Legacy localpy_* keys are still accepted for backward compatibility.
Type inference details
Default inferred target types now include:
- Numeric widths:
smallint/integer/bigint/numeric(for wide unsigned integers)real/double precision
- Temporal:
date,time,timetz,timestamp,timestamptz,interval
- Structured / special:
uuid,bytea,jsonb
- Arrays (homogeneous scalar list/tuple object columns):
boolean[],bigint[],double precision[],text[],uuid[],date[],time[],timetz[],timestamp[],timestamptz[],numeric[]- mixed or nested list structures fall back to
jsonb
Notes:
Decimalobject columns infernumeric(precision,scale)from sampled values.- Empty or ambiguous object columns fall back to
text(orjsonbfor ambiguous list structures). - You can always override with
pybridge_column_types.
Honest limitations
- Not Snowpark
- Not Spark
- Python runs on local machine / CI runner
- Not intended for huge tables
- Best for small/medium transforms
- Not a replacement for warehouse-scale computation
- For large tables, use filtering, incremental models, or chunked execution
First milestone command
dbt run -s customer_features
More examples
The examples/mvp_project/ directory has runnable models for each major
feature:
customer_features.py— minimal pandas table modelorders_polars.py— polars backend (pybridge_dataframe_backend='polars')daily_revenue_incremental.py— incremental +mergestrategy withunique_keyorders_with_jsonb.py—pybridge_column_typesoverrides forjsonb,text[], andnumeric(18,4)
cd examples/mvp_project
dbt run -s orders_polars
dbt run -s daily_revenue_incremental
dbt run -s daily_revenue_incremental # second run exercises merge
dbt run -s daily_revenue_incremental --full-refresh # rebuild from scratch
dbt run -s orders_with_jsonb
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_pybridge-0.1.1.tar.gz.
File metadata
- Download URL: dbt_pybridge-0.1.1.tar.gz
- Upload date:
- Size: 28.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0cfa930e542a502ffaca9b6b31756b88b247513aa5d3f84902e14c5454f2b60
|
|
| MD5 |
93b1f892217d0a4a6689db4b2d970255
|
|
| BLAKE2b-256 |
186e0b963bdb4acf7ef0a83aebda74a7f5dc5c5fdb31dd9dc6ed7c94bd69234c
|
Provenance
The following attestation bundles were made for dbt_pybridge-0.1.1.tar.gz:
Publisher:
release.yml on kraftaa/dbt-pybridge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbt_pybridge-0.1.1.tar.gz -
Subject digest:
b0cfa930e542a502ffaca9b6b31756b88b247513aa5d3f84902e14c5454f2b60 - Sigstore transparency entry: 1424741768
- Sigstore integration time:
-
Permalink:
kraftaa/dbt-pybridge@f75071df94cbf49bf02d9fe0fd31cd538475473b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/kraftaa
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f75071df94cbf49bf02d9fe0fd31cd538475473b -
Trigger Event:
push
-
Statement type:
File details
Details for the file dbt_pybridge-0.1.1-py3-none-any.whl.
File metadata
- Download URL: dbt_pybridge-0.1.1-py3-none-any.whl
- Upload date:
- Size: 23.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
281db758fd01a6b8bdefae46a84efb0d8381f598931ea79697e6fc12825cf175
|
|
| MD5 |
2d5461f2cf764c0c26f5d7caa9904c30
|
|
| BLAKE2b-256 |
d0bc1dca6c605cd299bc14a9bd4393f45041f0bcaf540b057e705b453f0b5438
|
Provenance
The following attestation bundles were made for dbt_pybridge-0.1.1-py3-none-any.whl:
Publisher:
release.yml on kraftaa/dbt-pybridge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbt_pybridge-0.1.1-py3-none-any.whl -
Subject digest:
281db758fd01a6b8bdefae46a84efb0d8381f598931ea79697e6fc12825cf175 - Sigstore transparency entry: 1424742086
- Sigstore integration time:
-
Permalink:
kraftaa/dbt-pybridge@f75071df94cbf49bf02d9fe0fd31cd538475473b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/kraftaa
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f75071df94cbf49bf02d9fe0fd31cd538475473b -
Trigger Event:
push
-
Statement type: