Skip to main content

Polars-like lazy access to a database.

Project description

LazyBear

LazyBear is a polars-flavored lazy SQL query builder on top of sqlalchemy. It lets you compose familiar transformations like select, filter, with_columns, join, group_by().agg(...), and sort/order_by against a SQL database, and then materialize results to polars or arrow (or stream them in batches).

The purpose of this library is to provide lazy, polars-like access to a single sql database server, providing multi-site (and multi-server) stable code while pushing most memory-intensive operations to the remote server.

  • Familiar, chainable API similar to polars
  • Backed by sqlalchemy for broad database support
  • Zero data is loaded until you call collect()/to_arrow()/writers
  • Convenient I/O helpers for CSV and Parquet (using polars)

Installation

  • LazyBear targets Python 3.10+.
  • Required dependencies:
    • sqlalchemy
    • polars
  • Optional:
    • pyarrow for to_arrow()

Install from source using pip:

# after git clone
pip install -e .
# straight from repo
pip install git+https://github.com/kpwhri/lazybear.git@master

Usage

Quickstart

Below is an end-to-end walkthrough using an in-memory sqlite database. The same api should work for other databases supported by sqlalchemy.

Create Playgruond

First, let's setup the backend play data:

import sqlalchemy as sa
import polars as pl

# create a sqlalchemy engine
eng = sa.create_engine('sqlite:///:memory:')

# prepare some tables for the demo
meta = sa.MetaData()
users = sa.Table(
    'users', meta,
    sa.Column('id', sa.Integer, primary_key=True),
    sa.Column('name', sa.String),
    sa.Column('age', sa.Integer),
)
orders = sa.Table(
    'orders', meta,
    sa.Column('id', sa.Integer, primary_key=True),
    sa.Column('user_id', sa.Integer),
    sa.Column('amount', sa.Float),
)
meta.create_all(eng)
with eng.begin() as conn:
    conn.execute(users.insert(), [
        {'id': 1, 'name': 'Ahti', 'age': 30},
        {'id': 2, 'name': 'Kalma', 'age': 28},
        {'id': 3, 'name': 'Tellervo', 'age': 41},
        {'id': 4, 'name': 'Ukko', 'age': 41},
    ])
    conn.execute(orders.insert(), [
        {'id': 10, 'user_id': 1, 'amount': 12.5},
        {'id': 11, 'user_id': 1, 'amount': 7.5},
        {'id': 12, 'user_id': 2, 'amount': 99.0},
    ])

Implemented SQL Operations

Now, let's see what we can do:

# import lazybear
from lazybear import scan_table, scan_sql_query, col

# scan tables lazily
lf_users = scan_table('users', eng)  # columns exposed lowercase by default
lf_orders = scan_table('orders', eng)

# basic select / filter / collect → returns a polars DataFrame
out_df = (
    lf_users
    .filter(col('age') > 30)
    .select('id', 'name', 'age')
    .collect()
)
print(out_df)

# with_columns — add or replace columns
with_cols_df = (
    lf_users
    .with_columns(('age2', col('age') * 2), decade=col('age') / 10 * 10)
    .select('id', 'age2', 'decade')
    .collect()
)
print(with_cols_df)

# order and limit (keeps ordering stable even across subqueries)
ordered_df = lf_users.order_by('-age', 'name').limit(2).collect()
print(ordered_df)

# polars-style sort api
sorted_df = lf_users.sort(by='age', descending=True).collect()
print(sorted_df)

# joins (left columns keep names; right overlaps get suffixed with `_y` by default)
joined_df = (
    lf_users
    .join(lf_orders, on={'id': 'user_id'}, how='left')
    .select('id', 'name', 'age', ('amount_y', col('amount')))
    .collect()
)
print(joined_df)

# group_by + aggregations
agg_df = (
    lf_users
    .group_by('age')
    .agg(n=('id', 'count'), min_id=('id', 'min'))
    .collect()
)
print(agg_df)

# expressions: membership, null checks, and string helpers
expr_df = (
    lf_users
    .filter(
        (col('name').str.contains('k', literal=True)) |
        (col('name').startswith('A')) |
        (col('age').is_in([28, 41]))
    )
    .collect()
)
print(expr_df)

Exporting Data

Finally, we'll

# iterating rows
for row in lf_users.order_by('id').iter_rows():
    # row is a tuple by default
    print(row)
for row in lf_users.order_by('id').iter_rows(named=True):
    # named=True yields dicts
    print(row)

# streaming in batches
for batch_df in lf_users.order_by('id').collect_batches(chunk_size=2):
    print('Batch:', batch_df)

# scan a raw SQL query (must be SELECT)
q_df = (
    scan_sql_query('select name, age from users where age > 30', eng)
    .filter(col('age') < 60)
    .select('name')
    .collect()
)
print(q_df)

# materialize to arrow (requires optional pyarrow)
arrow_tbl = lf_users.to_arrow()

# explain shows the composed SQL (with literal binds where possible)
print(lf_users.filter(col('age') > 30).explain())

I/O helpers

  • CSV
# single file
lf_users.order_by('id').write_csv('users.csv', include_header=True)

# append in chunks to one file
lf_users.order_by('id').write_csv('users_chunked.csv', chunk_size=2, include_header=True)
  • Parquet
# single file
lf_users.collect().write_parquet('users.parquet')

# chunked parts with a common prefix like users-00000.parquet, users-00001.parquet, ...
lf_users.order_by('id').write_parquet('users.parquet', chunk_size=2)

Notes:

  • write_csv/write_parquet use polars under the hood. For chunked Parquet, files are created with a numeric suffix.
  • to_arrow() requires pyarrow to be installed.

Advanced

  • case sensitivity: scan_table(..., lowercase=True) exposes columns as lowercase labels by default. Set lowercase=False to preserve database-reflected casing.
  • to_select() returns the current SQLAlchemy Select if you need to interop with SQLAlchemy APIs directly.
  • explain() returns the rendered SQL string; if supported, literal binds are inlined.

Minimal example

import sqlalchemy as sa
from lazybear import scan_table, col

engine = sa.create_engine('sqlite:///:memory:')
# ... create a table `users` with columns id, name, age ...
lf = scan_table('users', engine)
df = lf.filter(col('age') >= 30).select('id', 'name').collect()
print(df)

License

This project is licensed under the MIT License. See LICENSE.txt or https://kpwhri.mit-license.org for the full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazybear_polars-0.1.1.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazybear_polars-0.1.1-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file lazybear_polars-0.1.1.tar.gz.

File metadata

  • Download URL: lazybear_polars-0.1.1.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lazybear_polars-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ca4aa285818225b7e83e9ab64d6d6fc5c05a7ce47df350ca6f96df6deb729106
MD5 bd604feb6976f20f836c0040734f7d51
BLAKE2b-256 5f5abb0b28c9a945e96a6852f83b864de60e6beae739b3a93e242a426abcaee1

See more details on using hashes here.

Provenance

The following attestation bundles were made for lazybear_polars-0.1.1.tar.gz:

Publisher: release.yml on kpwhri/lazybear

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lazybear_polars-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for lazybear_polars-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 93b4e8d7459f068aeeb892b908380b2dced13927f54a9fc3a45f9b37cd999ddf
MD5 13d4acb56ca3b8adad78a389f2690e6e
BLAKE2b-256 ded11022c2e8913a9fd02f292a79f795f635400eb75ddd87dd056197daba06e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for lazybear_polars-0.1.1-py3-none-any.whl:

Publisher: release.yml on kpwhri/lazybear

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page