Polars-like lazy access to a database.
Project description
LazyBear
LazyBear is a polars-flavored lazy SQL query builder on top of sqlalchemy. It lets you compose familiar
transformations like select, filter, with_columns, join, group_by().agg(...), and sort/order_by against a
SQL database, and then materialize results to polars or arrow (or stream them in batches).
The purpose of this library is to provide lazy, polars-like access to a single sql database server, providing multi-site (and multi-server) stable code while pushing most memory-intensive operations to the remote server.
- Familiar, chainable API similar to polars
- Backed by sqlalchemy for broad database support
- Zero data is loaded until you call
collect()/to_arrow()/writers - Convenient I/O helpers for CSV and Parquet (using polars)
Installation
- LazyBear targets Python 3.10+.
- Required dependencies:
sqlalchemypolars
- Optional:
pyarrowforto_arrow()
Install from PyPI:
pip install lazybear-polars
Install from source using pip:
# after git clone
pip install -e .
# straight from repo
pip install git+https://github.com/kpwhri/lazybear.git@master
Usage
Quickstart
Below is an end-to-end walkthrough using an in-memory sqlite database. The same api should work for other databases supported by sqlalchemy.
Create Playgruond
First, let's setup the backend play data:
import sqlalchemy as sa
import polars as pl
# create a sqlalchemy engine
eng = sa.create_engine('sqlite:///:memory:')
# prepare some tables for the demo
meta = sa.MetaData()
users = sa.Table(
'users', meta,
sa.Column('id', sa.Integer, primary_key=True),
sa.Column('name', sa.String),
sa.Column('age', sa.Integer),
)
orders = sa.Table(
'orders', meta,
sa.Column('id', sa.Integer, primary_key=True),
sa.Column('user_id', sa.Integer),
sa.Column('amount', sa.Float),
)
meta.create_all(eng)
with eng.begin() as conn:
conn.execute(users.insert(), [
{'id': 1, 'name': 'Ahti', 'age': 30},
{'id': 2, 'name': 'Kalma', 'age': 28},
{'id': 3, 'name': 'Tellervo', 'age': 41},
{'id': 4, 'name': 'Ukko', 'age': 41},
])
conn.execute(orders.insert(), [
{'id': 10, 'user_id': 1, 'amount': 12.5},
{'id': 11, 'user_id': 1, 'amount': 7.5},
{'id': 12, 'user_id': 2, 'amount': 99.0},
])
Implemented SQL Operations
Now, let's see what we can do:
from lazybear import scan_table, scan_sql_query, col
# scan tables lazily
lf_users = scan_table('users', eng) # columns exposed lowercase by default
lf_orders = scan_table('orders', eng)
# basic select / filter / collect → returns a polars DataFrame
out_df = (
lf_users
.filter(col('age') > 30)
.select('id', 'name', 'age')
.collect()
)
print(out_df)
# with_columns — add or replace columns
with_cols_df = (
lf_users
.with_columns(('age2', col('age') * 2), decade=col('age') / 10 * 10)
.select('id', 'age2', 'decade')
.collect()
)
print(with_cols_df)
# order and limit (keeps ordering stable even across subqueries)
ordered_df = lf_users.order_by('-age', 'name').limit(2).collect()
print(ordered_df)
# polars-style sort api
sorted_df = lf_users.sort(by='age', descending=True).collect()
print(sorted_df)
# joins (left columns keep names; right overlaps get suffixed with `_y` by default)
joined_df = (
lf_users
.join(lf_orders, on={'id': 'user_id'}, how='left')
.select('id', 'name', 'age', ('amount_y', col('amount')))
.collect()
)
print(joined_df)
# group_by + aggregations
agg_df = (
lf_users
.group_by('age')
.agg(n=('id', 'count'), min_id=('id', 'min'))
.collect()
)
print(agg_df)
# expressions: membership, null checks, and string helpers
expr_df = (
lf_users
.filter(
(col('name').str.contains('k', literal=True)) |
(col('name').startswith('A')) |
(col('age').is_in([28, 41]))
)
.collect()
)
print(expr_df)
Uploading Temp Tables (Beta)
You can use a local polars DataFrame in lazy sql operations by creating a temporary table. This is useful for joining local data with database tables.
Be aware that the table will be inserted when collect is called. After the collection is complete, a best effort attempt to delete the temp table will be completed.
import polars as pl
from lazybear import scan_df, scan_table, col
# local polars dataframe
df_local = pl.DataFrame({
'user_id': [1, 3],
'status': ['active', 'inactive']
})
# create a temp table frame
lf_temp = scan_df(df_local, eng, table_name='user_status')
# join with a database table
lf_users = scan_table('users', eng)
result = (
lf_users
.join(lf_temp, on={'id': 'user_id'})
.select('name', 'status')
.collect()
)
Limitations & Behavior:
- Temp tables are created and data is inserted only when
collect()(or another execution method) is called. - Best-effort cleanup (DROP TABLE) is performed after the data is fetched.
- Currently offers beta support for sqlite, PostgreSQL, SQL Server (MSSQL), Oracle, and Teradata. A warning is issued for other dialects.
- For certain dialects, will attempt bulk insert to speed up processing
Exporting Data
Finally, we'll
# iterating rows
for row in lf_users.order_by('id').iter_rows():
# row is a tuple by default
print(row)
for row in lf_users.order_by('id').iter_rows(named=True):
# named=True yields dicts
print(row)
# streaming in batches
for batch_df in lf_users.order_by('id').collect_batches(chunk_size=2):
print('Batch:', batch_df)
# scan a raw SQL query (must be SELECT)
q_df = (
scan_sql_query('select name, age from users where age > 30', eng)
.filter(col('age') < 60)
.select('name')
.collect()
)
print(q_df)
# materialize to arrow (requires optional pyarrow)
arrow_tbl = lf_users.to_arrow()
# explain shows the composed SQL (with literal binds where possible)
print(lf_users.filter(col('age') > 30).explain())
I/O helpers
- CSV
# single file
lf_users.order_by('id').write_csv('users.csv', include_header=True)
# append in chunks to one file
lf_users.order_by('id').write_csv('users_chunked.csv', chunk_size=2, include_header=True)
- Parquet
# single file
lf_users.collect().write_parquet('users.parquet')
# chunked parts with a common prefix like users-00000.parquet, users-00001.parquet, ...
lf_users.order_by('id').write_parquet('users.parquet', chunk_size=2)
Notes:
write_csv/write_parquetuse polars under the hood. For chunked Parquet, files are created with a numeric suffix.to_arrow()requirespyarrowto be installed.
Advanced
- case sensitivity:
scan_table(..., lowercase=True)exposes columns as lowercase labels by default. Setlowercase=Falseto preserve database-reflected casing. to_select()returns the current SQLAlchemySelectif you need to interop with SQLAlchemy APIs directly.explain()returns the rendered SQL string; if supported, literal binds are inlined.
Minimal example
import sqlalchemy as sa
from lazybear import scan_table, col
engine = sa.create_engine('sqlite:///:memory:')
# ... create a table `users` with columns id, name, age ...
lf = scan_table('users', engine)
df = lf.filter(col('age') >= 30).select('id', 'name').collect()
print(df)
License
This project is licensed under the MIT License. See LICENSE.txt or https://kpwhri.mit-license.org for the full text.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lazybear_polars-0.2.0.tar.gz.
File metadata
- Download URL: lazybear_polars-0.2.0.tar.gz
- Upload date:
- Size: 25.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eaa66c8298f012caf9aa309196e2e66cba03ecfaa828cb80bd30a5df33d16f64
|
|
| MD5 |
a2b4e834441ea2d4899dd405ecb0e61b
|
|
| BLAKE2b-256 |
b834f6108a52787bfdc21fc102069b2a797d00faedfaa96a93169c185d857044
|
Provenance
The following attestation bundles were made for lazybear_polars-0.2.0.tar.gz:
Publisher:
release.yml on kpwhri/lazybear
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lazybear_polars-0.2.0.tar.gz -
Subject digest:
eaa66c8298f012caf9aa309196e2e66cba03ecfaa828cb80bd30a5df33d16f64 - Sigstore transparency entry: 858482474
- Sigstore integration time:
-
Permalink:
kpwhri/lazybear@1a5c6892146ecf421ae3d8a19c90bd93c3a69519 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/kpwhri
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@1a5c6892146ecf421ae3d8a19c90bd93c3a69519 -
Trigger Event:
push
-
Statement type:
File details
Details for the file lazybear_polars-0.2.0-py3-none-any.whl.
File metadata
- Download URL: lazybear_polars-0.2.0-py3-none-any.whl
- Upload date:
- Size: 23.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef8bbff5e0159a9c8b629c0ab243e07358ac79e049e717d4b4ff9d9594b49101
|
|
| MD5 |
052c4ecc0327e21830e932608b620799
|
|
| BLAKE2b-256 |
1349abe1b7c98cbe4c49f3a9df20cfd502125ce1f11bff95935a11046ea4c510
|
Provenance
The following attestation bundles were made for lazybear_polars-0.2.0-py3-none-any.whl:
Publisher:
release.yml on kpwhri/lazybear
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lazybear_polars-0.2.0-py3-none-any.whl -
Subject digest:
ef8bbff5e0159a9c8b629c0ab243e07358ac79e049e717d4b4ff9d9594b49101 - Sigstore transparency entry: 858482534
- Sigstore integration time:
-
Permalink:
kpwhri/lazybear@1a5c6892146ecf421ae3d8a19c90bd93c3a69519 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/kpwhri
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@1a5c6892146ecf421ae3d8a19c90bd93c3a69519 -
Trigger Event:
push
-
Statement type: