Run dbt Python models locally and materialize results back to Postgres
Project description
dbt-pybridge
Run dbt Python models on Postgres.
dbt-pybridge executes Python locally or in CI, reads inputs from Postgres via dbt.ref() / dbt.source(), then writes results back to Postgres.
It works by:
- compiling
.pymodels through dbt - executing Python locally (developer laptop or CI runner)
- loading
dbt.ref()/dbt.source()data into pandas/polars - writing the returned dataframe back into Postgres
This is useful when you want Python transforms in dbt without a warehouse-native Python runtime (for example Spark/Snowpark).
Status
MVP scope for Python table + incremental + view materializations is implemented.
- Supported:
materialized='table' - Supported:
materialized='incremental'(strategies:append,merge,delete+insert) - Supported:
materialized='view'(implemented as a managed backing table + SQL view) - Supported DAG:
sql -> python -> sql - Supported return types: pandas DataFrame, polars DataFrame, or iterable/generator of dataframes
Install
pip install dbt-pybridge
Requires:
- Python
>=3.11(3.11/3.12 recommended) - Postgres
>=10(CASCADE drops,force_nullwith column lists, partitioned-table relkind support) dbt-core >=1.10,<1.12
Optional extras:
- Machine-learning examples (
customers_kmeans,orders_anomaly_isoforest,products_similarity_tfidf) requirescikit-learn:
pip install "dbt-pybridge[examples]"
Trust model
dbt-pybridge runs your Python model code in the same process as the
adapter via exec(). The model has full access to the database connection
(read AND write), the local filesystem, environment variables, and any
network the adapter can reach.
Practical implications:
- Treat
.pymodels the same as any other application code — only run models from sources you trust. - The
.select(...)/.where(...)/.join(..., on=...)helpers accept raw SQL fragments. The lightweight;and"guards are there to catch accidental typos, not to sandbox untrusted SQL. If you're in a context where authors of the python models can't be trusted, dbt-pybridge isn't the right tool. - Your DB role's grants are the only real boundary. Run the dbt user with the least privileges needed for the models in scope.
Docs
- Scaling Python models — SQL vs Python decision rules, chunking/incremental patterns, and debugging checklist.
Profile
Set your profile type to pybridge:
my_profile:
target: dev
outputs:
dev:
type: pybridge
host: localhost
user: postgres
password: postgres
port: 5432
dbname: analytics
schema: public
threads: 1
Example model
def model(dbt, session):
df = dbt.ref("stg_orders")
df["double_amount"] = df["amount"] * 2
return df
Safer projection pattern:
def model(dbt, session):
df = dbt.ref("stg_orders").select("order_id, amount, customer_id")
return df
SQL pushdown helpers
dbt.ref(...) returns a RelationFrame that you can chain SQL clauses onto
before the data is pulled into Python. Each helper wraps the previous SQL in
a subquery, so they compose. Use them when you only need a slice of upstream:
def model(dbt, session):
# Column projection — pushed into Postgres
cols = dbt.ref("stg_orders").select("order_id, amount, customer_id")
# Row filter — pushed into Postgres
big = dbt.ref("stg_orders").where("amount > 100")
# Join two refs in Postgres before loading
enriched = (
dbt.ref("stg_orders")
.join(dbt.ref("stg_customers"), on="customer_id", how="left")
)
# Multi-key joins:
pairs = dbt.ref("a").join(dbt.ref("b"), on=["customer_id", "site_id"])
# Cross join:
cartesian = dbt.ref("a").join(dbt.ref("b"), on=None, how="cross")
return enriched.as_dataframe()
Notes:
- All three reject
;and (for joins)"to keep accidental SQL injection out of model code; user code is already trusted (it runs viaexec), but these guards catch typos. .join()uses Postgres'USING (col, ...)so the join column appears once in the output (matches pandas/polars merge semantics).- The full DataFrame still loads into Python on
as_dataframe()/iter_batches; the pushdowns just shrink what's pulled.
Schema evolution for incremental models
If your incremental model adds/removes columns over time, set
on_schema_change so dbt-pybridge reconciles the target table automatically:
def model(dbt, session):
dbt.config(
materialized="incremental",
unique_key="order_id",
on_schema_change="append_new_columns", # default is "ignore"
)
...
Supported values:
| value | behavior |
|---|---|
ignore (default) |
fail on column mismatch (preserves existing target schema) |
fail |
explicit fail with details on what drifted |
append_new_columns |
ALTER TABLE ... ADD COLUMN for any new dataframe column |
sync_all_columns |
append new columns AND drop columns missing from the model |
When ignore raises, the error message lists exactly which columns are
missing or new and points you at append_new_columns or --full-refresh.
By default sync_all_columns drops columns with plain ALTER TABLE ... DROP COLUMN (PostgreSQL's default RESTRICT behavior). If a removed column has
dependents (views, materialized views, indexes, generated columns), Postgres
will refuse the drop and you'll get a clear error. To resolve, either:
- drop the dependents yourself and rerun,
- run
dbt run --full-refreshfor a clean rebuild, or - opt into cascading drops with
pybridge_sync_drop_cascade=True:
dbt.config(
materialized="incremental",
on_schema_change="sync_all_columns",
pybridge_sync_drop_cascade=True, # ⚠️ silently drops dependents
)
⚠️
pybridge_sync_drop_cascade=Trueis destructive. It usesDROP COLUMN ... CASCADE, which silently removes any dependent objects — including ones not managed by dbt. Only enable it when you're sure that's what you want.
How to create Python models
- Create
models/<name>_python.py. - Define exactly one callable entrypoint:
def model(dbt, session): .... - Set materialization inside the function:
dbt.config(materialized="table")
- Read upstream inputs using standalone ref/source assignments (important for dbt parser):
orders = dbt.ref("stg_orders")raw_orders = dbt.source("raw", "orders")
- Return one of:
- pandas DataFrame
- polars DataFrame
- iterable/generator that yields pandas/polars DataFrames
Parser-safe pattern:
def model(dbt, session):
dbt.config(materialized="table")
orders = dbt.ref("stg_orders")
result = orders.copy()
result["double_amount"] = result["amount"] * 2
return result
Chunked mode:
def model(dbt, session):
for batch in dbt.ref("stg_orders").iter_batches(batch_size=100_000):
yield transform(batch)
Runtime logging includes progress messages such as:
[pybridge] Loading "transform"."stg_orders" (2,300,000 rows, 120.0 MB)[pybridge] Processing batch 1, rows=100000[pybridge] Writing batch 1, rows=100000
Runtime configs
Set model-level configs via dbt.config(...) in your python model:
pybridge_dataframe_backend:pandas(default) orpolarspybridge_max_rows: hard limit before failure (default1_000_000)pybridge_warn_rows: warning threshold (default200_000)pybridge_max_bytes: hard estimated table-size limit before failure (default536870912, 512MB)pybridge_warn_bytes: warning estimated table-size threshold (default134217728, 128MB)pybridge_allow_large_tables: bypass hard row limit (defaultfalse)pybridge_chunked_mode: allow oversized input only when usingiter_batches(defaultfalse)pybridge_batch_size: default batch size foriter_batches(default100_000)pybridge_column_types: optional explicit type map for created target tables, for example:{"id": "numeric(18,0)", "created_at": "timestamp", "payload": "jsonb"}
pybridge_categorical_types: optional categorical-column enum type map, for example:{"status": "status_enum", "tier": "tier_enum"}
pybridge_sync_drop_cascade: whenon_schema_change='sync_all_columns', controls whether dropped columns useDROP COLUMN ... CASCADE(defaultfalse— drops fail if dependents exist; opt in to silently drop dependents)
Legacy localpy_* keys are still accepted for backward compatibility.
Type inference details
Default inferred target types now include:
- Numeric widths:
smallint/integer/bigint/numeric(for wide unsigned integers)real/double precision
- Temporal:
date,time,timetz,timestamp,timestamptz,interval
- Structured / special:
uuid,bytea,jsonb
- Arrays (homogeneous scalar list/tuple object columns):
boolean[],bigint[],double precision[],text[],uuid[],date[],time[],timetz[],timestamp[],timestamptz[],numeric[]- mixed or nested list structures fall back to
jsonb
Notes:
Decimalobject columns infernumeric(precision,scale)from sampled values.- Empty or ambiguous object columns fall back to
text(orjsonbfor ambiguous list structures). - You can always override with
pybridge_column_types.
Honest limitations
- Not Snowpark
- Not Spark
- Python runs on local machine / CI runner
- Not intended for huge tables
- Best for small/medium transforms
- Not a replacement for warehouse-scale computation
- For large tables, use filtering, incremental models, or chunked execution
First milestone command
dbt run -s customer_features
More examples
The examples/mvp_project/ directory has runnable models for each major
feature:
customer_features.py— minimal pandas table modelorders_polars.py— polars backend (pybridge_dataframe_backend='polars')daily_revenue_incremental.py— incremental +mergestrategy withunique_keyorders_chunked_incremental.py— chunked incremental pattern for bounded memory usageorders_with_jsonb.py—pybridge_column_typesoverrides forjsonb,text[], andnumeric(18,4)
cd examples/mvp_project
dbt run -s orders_polars
dbt run -s daily_revenue_incremental
dbt run -s daily_revenue_incremental # second run exercises merge
dbt run -s daily_revenue_incremental --full-refresh # rebuild from scratch
dbt run -s orders_chunked_incremental
dbt run -s orders_with_jsonb
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_pybridge-0.2.0.tar.gz.
File metadata
- Download URL: dbt_pybridge-0.2.0.tar.gz
- Upload date:
- Size: 36.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b998fe05c000e2bf0f2476de42b26cedf2f8ca71f2c41ef36ccc68b250be165
|
|
| MD5 |
c9b9f5436ce563eda229381f3edbf61f
|
|
| BLAKE2b-256 |
a441c87b815216d350044b2bd44071bd23f2324ae20f96671239c3e4e59b2ba8
|
Provenance
The following attestation bundles were made for dbt_pybridge-0.2.0.tar.gz:
Publisher:
release.yml on kraftaa/dbt-pybridge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbt_pybridge-0.2.0.tar.gz -
Subject digest:
6b998fe05c000e2bf0f2476de42b26cedf2f8ca71f2c41ef36ccc68b250be165 - Sigstore transparency entry: 1438905322
- Sigstore integration time:
-
Permalink:
kraftaa/dbt-pybridge@0c7cb711743a2b8a7f70d83c449d98dbf525df9b -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/kraftaa
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0c7cb711743a2b8a7f70d83c449d98dbf525df9b -
Trigger Event:
push
-
Statement type:
File details
Details for the file dbt_pybridge-0.2.0-py3-none-any.whl.
File metadata
- Download URL: dbt_pybridge-0.2.0-py3-none-any.whl
- Upload date:
- Size: 27.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
626924329f92e16d23a28a3c18941e9479ceef3e509089b4465ee1d6af5bfb9e
|
|
| MD5 |
b6ab2376a5c9c302eab62f41742c3d29
|
|
| BLAKE2b-256 |
6553b7d77dba406284dcf8b9f14345a895950d2281b0697ee675210f8286e51d
|
Provenance
The following attestation bundles were made for dbt_pybridge-0.2.0-py3-none-any.whl:
Publisher:
release.yml on kraftaa/dbt-pybridge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbt_pybridge-0.2.0-py3-none-any.whl -
Subject digest:
626924329f92e16d23a28a3c18941e9479ceef3e509089b4465ee1d6af5bfb9e - Sigstore transparency entry: 1438905332
- Sigstore integration time:
-
Permalink:
kraftaa/dbt-pybridge@0c7cb711743a2b8a7f70d83c449d98dbf525df9b -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/kraftaa
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0c7cb711743a2b8a7f70d83c449d98dbf525df9b -
Trigger Event:
push
-
Statement type: