Polars-like lazy access to a database.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

LazyBear

LazyBear is a polars-flavored lazy SQL query builder on top of sqlalchemy. It lets you compose familiar transformations like select, filter, with_columns, join, group_by().agg(...), and sort/order_by against a SQL database, and then materialize results to polars or arrow (or stream them in batches).

The purpose of this library is to provide lazy, polars-like access to a single sql database server, providing multi-site (and multi-server) stable code while pushing most memory-intensive operations to the remote server.

Familiar, chainable API similar to polars
Backed by sqlalchemy for broad database support
Zero data is loaded until you call collect()/to_arrow()/writers
Convenient I/O helpers for CSV and Parquet (using polars)

Full API documentation:

Installation

LazyBear targets Python 3.11+.
Required dependencies:
- sqlalchemy
- polars
Optional:
- pyarrow for to_arrow()

Install from PyPI:

pip install lazybear-polars

Install from source using pip:

# after git clone
pip install -e .
# straight from repo
pip install git+https://github.com/kpwhri/lazybear.git@master

Usage

Quickstart

Below is an end-to-end walkthrough using an in-memory sqlite database. The same api should work for other databases supported by sqlalchemy.

Create Playgruond

First, let's setup the backend play data:

import sqlalchemy as sa
import polars as pl

# create a sqlalchemy engine
eng = sa.create_engine('sqlite:///:memory:')

# prepare some tables for the demo
meta = sa.MetaData()
users = sa.Table(
    'users', meta,
    sa.Column('id', sa.Integer, primary_key=True),
    sa.Column('name', sa.String),
    sa.Column('age', sa.Integer),
)
orders = sa.Table(
    'orders', meta,
    sa.Column('id', sa.Integer, primary_key=True),
    sa.Column('user_id', sa.Integer),
    sa.Column('amount', sa.Float),
)
meta.create_all(eng)
with eng.begin() as conn:
    conn.execute(users.insert(), [
        {'id': 1, 'name': 'Ahti', 'age': 30},
        {'id': 2, 'name': 'Kalma', 'age': 28},
        {'id': 3, 'name': 'Tellervo', 'age': 41},
        {'id': 4, 'name': 'Ukko', 'age': 41},
    ])
    conn.execute(orders.insert(), [
        {'id': 10, 'user_id': 1, 'amount': 12.5},
        {'id': 11, 'user_id': 1, 'amount': 7.5},
        {'id': 12, 'user_id': 2, 'amount': 99.0},
    ])

Implemented SQL Operations

Now, let's see what we can do:

from lazybear import scan_table, scan_sql_query, col

# scan tables lazily
lf_users = scan_table('users', eng)  # columns exposed lowercase by default
lf_orders = scan_table('orders', eng)

# basic select / filter / collect → returns a polars DataFrame
out_df = (
    lf_users
    .filter(col('age') > 30)
    .select('id', 'name', 'age')
    .collect()
)
print(out_df)

# with_columns — add or replace columns
with_cols_df = (
    lf_users
    .with_columns(('age2', col('age') * 2), decade=col('age') / 10 * 10)
    .select('id', 'age2', 'decade')
    .collect()
)
print(with_cols_df)

# order and limit (keeps ordering stable even across subqueries)
ordered_df = lf_users.order_by('-age', 'name').limit(2).collect()
print(ordered_df)

# polars-style sort api
sorted_df = lf_users.sort(by='age', descending=True).collect()
print(sorted_df)

# joins
joined_df = (
    lf_users
    .join(lf_orders, left_on='id', right_on='user_id', how='left')
    .select('id', 'name', 'age', 'amount')
    .collect()
)
print(joined_df)

# rename all right-side columns with a prefix
prefixed_join_df = (
    lf_users
    .join(lf_orders, left_on='id', right_on='user_id', how='left', prefix='order_')
    .select('id', 'name', 'order_id', 'order_amount')
    .collect()
)
print(prefixed_join_df)

# joins (left columns keep names; right overlaps get suffixed with `_y` by default)
joined_df = (
    lf_users
    .join(lf_orders, on={'id': 'user_id'}, how='left')
    .select('id', 'name', 'age', ('amount_y', col('amount')))
    .collect()
)
print(joined_df)

# when chaining multiple joins that overlap on the same column names,
# provide a different suffixes=... value for later joins to avoid label collisions.
chained_join_df = (
    lf_users
    .join(lf_orders, on={'id': 'user_id'}, how='left')
    .join(lf_orders, on={'id': 'user_id'}, how='left', suffixes=('_x2', '_y2'))
    .select('id', 'name', 'amount', 'amount_y2')
    .order_by('id', 'amount', 'amount_y2')
    .collect()
)
print(chained_join_df)

# group_by + aggregations
agg_df = (
    lf_users
    .group_by('age')
    .agg(n=('id', 'count'), min_id=('id', 'min'))
    .collect()
)
print(agg_df)

# expressions: membership, null checks, and string helpers
expr_df = (
    lf_users
    .filter(
        (col('name').str.contains('k', literal=True)) |
        (col('name').startswith('A')) |
        (col('age').is_in([28, 41]))
    )
    .collect()
)
print(expr_df)

Uploading Temp Tables (Beta)

You can use a local polars DataFrame in lazy sql operations by creating a temporary table. This is useful for joining local data with database tables.

Be aware that the table will be inserted when collect is called. After the collection is complete, a best effort attempt to delete the temp table will be completed.

import polars as pl
from lazybear import scan_df, scan_table, col

# local polars dataframe
df_local = pl.DataFrame({
    'user_id': [1, 3],
    'status': ['active', 'inactive']
})

# create a temp table frame
lf_temp = scan_df(df_local, eng, table_name='user_status')

# join with a database table
lf_users = scan_table('users', eng)
result = (
    lf_users
    .join(lf_temp, on={'id': 'user_id'})
    .select('name', 'status')
    .collect()
)

Limitations & Behavior:

Temp tables are created and data is inserted only when collect() (or another execution method) is called.
Best-effort cleanup (DROP TABLE) is performed after the data is fetched.
Currently offers beta support for sqlite, PostgreSQL, SQL Server (MSSQL), Oracle, and Teradata. A warning is issued for other dialects.
For certain dialects, will attempt bulk insert to speed up processing

Exporting Data

Finally, we'll

# iterating rows
for row in lf_users.order_by('id').iter_rows():
    # row is a tuple by default
    print(row)
for row in lf_users.order_by('id').iter_rows(named=True):
    # named=True yields dicts
    print(row)

# streaming in batches
for batch_df in lf_users.order_by('id').collect_batches(chunk_size=2):
    print('Batch:', batch_df)

# scan a raw SQL query (must be SELECT)
q_df = (
    scan_sql_query('select name, age from users where age > 30', eng)
    .filter(col('age') < 60)
    .select('name')
    .collect()
)
print(q_df)

# materialize to arrow (requires optional pyarrow)
arrow_tbl = lf_users.to_arrow()

# explain shows the composed SQL (with literal binds where possible)
print(lf_users.filter(col('age') > 30).explain())

I/O helpers

# single file
lf_users.order_by('id').write_csv('users.csv', include_header=True)

# append in chunks to one file
lf_users.order_by('id').write_csv('users_chunked.csv', chunk_size=2, include_header=True)

Parquet

# single file
lf_users.collect().write_parquet('users.parquet')

# chunked parts with a common prefix like users-00000.parquet, users-00001.parquet, ...
lf_users.order_by('id').write_parquet('users.parquet', chunk_size=2)

Notes:

write_csv/write_parquet use polars under the hood. For chunked Parquet, files are created with a numeric suffix.
to_arrow() requires pyarrow to be installed.

API

Properties

columns: Returns a list of column names in the frame.
engine: Returns the SQLAlchemy Engine that the frame is bound to.

Joins

join keeps left-side columns unchanged and then appends columns from the right-side frame.

Option	Default	Behavior
`how`	`'inner'`	Join type: `'inner'`, `'left'`, `'right'`, or `'full'`.
`on`	`None`	Join key or keys. Use a string/list when key names match, or a mapping like `{'left_id': 'right_id'}` when key names differ.
`left_on`, `right_on`	`None`	Explicit left and right join keys. Use these instead of `on`.
`prefix`	`None`	Prefix for right-side renamed columns. Takes precedence over `suffix` and deprecated `suffixes`.
`suffix`	`None`	Suffix for right-side renamed columns. Used only when `prefix` is not supplied.
`apply_to_all`	`True`	When using `prefix` or `suffix`, apply it to every right-side column. Set to `False` to rename only right-side columns that overlap with left-side names.
`duplicate_columns`	`'rename'`	Use `'rename'` to rename overlapping right-side columns, or `'drop'` to omit them.
`suffixes`	`None`	Deprecated. Use `suffix` or `prefix` instead. If supplied, `suffixes[-1]` is used for right-side column renaming.

Right-side naming precedence is:

prefix
suffix
suffixes[-1]
generated suffix such as _right or _right2

prefix and suffix are not combined. If both are supplied, prefix wins.

Examples:

# Default: duplicate right-side names are renamed with a generated suffix.
joined = lf_users.join(lf_orders, on={'id': 'user_id'}, how='left')
# columns: id, name, age, id_right, user_id, amount

# Prefix every right-side column.
joined = lf_users.join(lf_orders, on={'id': 'user_id'}, prefix='order_')
# columns: id, name, age, order_id, order_user_id, order_amount

# Suffix every right-side column.
joined = lf_users.join(lf_orders, on={'id': 'user_id'}, suffix='_order')
# columns: id, name, age, id_order, user_id_order, amount_order

# Suffix only overlapping right-side columns.
joined = lf_users.join(
    lf_orders,
    on={'id': 'user_id'},
    suffix='_order',
    apply_to_all=False,
)
# columns: id, name, age, id_order, user_id, amount

# Drop overlapping right-side columns.
joined = lf_users.join(
    lf_orders,
    on={'id': 'user_id'},
    duplicate_columns='drop',
)
# columns: id, name, age, user_id, amount

Materialization & Execution

collect(limit=None, infer_schema_length=200): Materializes the lazy query into a polars DataFrame.
to_arrow(limit=None): Materializes the query into a pyarrow.Table.
collect_batches(chunk_size=10_000): Returns an iterator of polars DataFrame chunks.
iter_rows(named=False, chunk_size=10_000): Returns an iterator of row tuples (or dicts if named=True).
explain(): Returns the rendered SQL string for the query.

I/O Helpers

write_parquet(file, chunk_size=None, start_index=0, **kwargs): Writes the result to one or more Parquet files.
write_csv(file, chunk_size=None, **kwargs): Writes the result to a CSV file.

Advanced

immutability: every transform (select, filter, with_columns, join, sort, limit, group_by) returns a new LazyBearFrame.
to_select() returns the current SQLAlchemy Select if you need to interop with SQLAlchemy APIs directly.
join column naming: overlapping right-side columns are suffixed with _y by default; If you chain multiple joins that would reuse the same labels, pass custom suffixes on later joins to keep names unique.
case sensitivity: scan_table(..., lowercase=True) exposes columns as lowercase labels by default. Set lowercase=False to preserve database-reflected casing.
explain() returns the rendered SQL string; if supported, literal binds are inlined.

Database-specific result cleaning

lazybear applies a small database-specific cleaning step after SQL results are materialized into a polars DataFrame.

Currently, this is used for Teradata connections, including connections using the teradatasql driver. Teradata character datatypes may return values padded with trailing whitespace. When lazybear detects a Teradata connection, trailing whitespace is stripped from string columns in collected results.

This applies to result materialization methods such as:

collect()
collect_batches()
iter_rows(), through collect_batches()

For example, a Teradata value like:

python
"ABC   "

is returned as:

python
"ABC"

This cleaning is intentionally implemented through a dispatch layer so that future database-specific cleanup behavior can be added in one place. For example, other dialects could eventually normalize driver-specific string padding, timestamp quirks, binary values, or other result-formatting issues before returning a polars DataFrame.

Notes:

The Teradata cleanup only affects string columns.
It strips trailing whitespace, not leading whitespace.
Non-Teradata connections currently return results unchanged by this cleaning step.

Minimal example

import sqlalchemy as sa
from lazybear import scan_table, col

engine = sa.create_engine('sqlite:///:memory:')
# ... create a table `users` with columns id, name, age ...
lf = scan_table('users', engine)
df = lf.filter(col('age') >= 30).select('id', 'name').collect()
print(df)

License

This project is licensed under the MIT License. See LICENSE.txt or https://kpwhri.mit-license.org for the full text.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dcronkite

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.1

May 26, 2026

0.3.0

May 7, 2026

0.2.1

Apr 8, 2026

0.2.0

Jan 27, 2026

0.1.1

Jan 14, 2026

0.1.0

Jan 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazybear_polars-0.3.1.tar.gz (35.0 kB view details)

Uploaded May 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lazybear_polars-0.3.1-py3-none-any.whl (29.8 kB view details)

Uploaded May 26, 2026 Python 3

File details

Details for the file lazybear_polars-0.3.1.tar.gz.

File metadata

Download URL: lazybear_polars-0.3.1.tar.gz
Upload date: May 26, 2026
Size: 35.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lazybear_polars-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`26e203e4eca1cc90b521f8d8961c74e9dcf32957b31ac63512831c99c95eb74d`
MD5	`2a3d11988235dbce4eab42d8d3f0b4d5`
BLAKE2b-256	`eac9e5166263b0132453a401695840e0e7e15d45593df2d48280e591c92c5e1d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lazybear_polars-0.3.1.tar.gz:

Publisher: release.yml on kpwhri/lazybear

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lazybear_polars-0.3.1.tar.gz
- Subject digest: 26e203e4eca1cc90b521f8d8961c74e9dcf32957b31ac63512831c99c95eb74d
- Sigstore transparency entry: 1635461400
- Sigstore integration time: May 26, 2026
Source repository:
- Permalink: kpwhri/lazybear@701dbfdcaa360e6437bc3b47c59b91eb8324c1f3
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/kpwhri
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@701dbfdcaa360e6437bc3b47c59b91eb8324c1f3
- Trigger Event: push

File details

Details for the file lazybear_polars-0.3.1-py3-none-any.whl.

File metadata

Download URL: lazybear_polars-0.3.1-py3-none-any.whl
Upload date: May 26, 2026
Size: 29.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lazybear_polars-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dfa690aa5a06b5944df31263d7ab0db47826f881f2ec6713ac43226e1f475629`
MD5	`f8c7c9f913f87e302be5506f8792ba39`
BLAKE2b-256	`7e27977ad9a11f318802b389631454facc0dd188e87eeea27f00d4fa49dd6580`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lazybear_polars-0.3.1-py3-none-any.whl:

Publisher: release.yml on kpwhri/lazybear

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lazybear_polars-0.3.1-py3-none-any.whl
- Subject digest: dfa690aa5a06b5944df31263d7ab0db47826f881f2ec6713ac43226e1f475629
- Sigstore transparency entry: 1635461420
- Sigstore integration time: May 26, 2026
Source repository:
- Permalink: kpwhri/lazybear@701dbfdcaa360e6437bc3b47c59b91eb8324c1f3
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/kpwhri
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@701dbfdcaa360e6437bc3b47c59b91eb8324c1f3
- Trigger Event: push

lazybear-polars 0.3.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

LazyBear

Installation

Usage

Quickstart

Create Playgruond

Implemented SQL Operations

Uploading Temp Tables (Beta)

Exporting Data

I/O helpers

API

Properties

Joins

Materialization & Execution

I/O Helpers

Advanced

Database-specific result cleaning

Minimal example

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance