Custom parsing extensions for lazy polars
Project description
polars io tools
Custom parsing extensions for lazy polars
Overview
polars-io-tools extends Polars lazy execution with custom I/O
sources that push filters and column projections all the way down into the systems
that hold your data — SQL databases, ClickHouse, Datadog, and Delta Lake — instead of
loading everything and filtering in memory. It also adds lazy-friendly operations
(joins, multi-source composition, time-series windows, caching, distributed execution)
that keep predicate pushdown working where vanilla Polars would otherwise give up and
materialize the whole frame.
Everything is exposed through the piot LazyFrame namespace and a handful of
top-level scan_* / sink_* functions, so it composes naturally with the Polars
API you already use.
Who is this for
Reach for polars-io-tools when you want Polars' lazy API over data that lives in an
external store, and you care about not fetching rows or columns you will immediately
throw away. It is most valuable for large, partitioned, or remote datasets where a
filter on a date or key column should translate into a smaller query against the
source. If your data already fits comfortably in memory or lives in local Parquet/CSV,
plain Polars is the simpler choice.
Installation
pip install polars-io-tools
polars-io-tools requires Python 3.11 or newer. See the
Installation guide
for conda and source builds.
Quickstart
Importing the package registers the piot namespace on every Polars LazyFrame:
import polars as pl
import polars_io_tools # registers the .piot namespace
left = pl.LazyFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
right = pl.LazyFrame({"x": [-1, -2, 3], "z": [7, 8, 9]})
# An inner join where the keys present on the left are pushed down as a
# filter on the right frame *before* the join runs.
result = left.piot.filtered_join(right, on="x").collect()
print(result)
# shape: (1, 3)
# ┌─────┬─────┬─────┐
# │ x ┆ y ┆ z │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ 3 ┆ 6 ┆ 9 │
# └─────┴─────┴─────┘
For a guided walkthrough, start with the Getting Started tutorial.
What's included
- Lazy I/O sources with predicate & projection pushdown —
scan_db(any ODBC database),scan_clickhouse,scan_datadog,scan_delta, andfrom_narwhals. Filters on the resultingLazyFrameare translated into the source's own query language (SQLWHERE, Datadog time ranges, Delta partition pruning) so only the matching rows and columns are fetched. - Lazy writers —
sink_deltaandsink_clickhousewrite aLazyFramedirectly to Delta Lake or ClickHouse, including streaming/chunked writes and transparent handling of types the target store cannot represent natively. - Pushdown-preserving query building —
filtered_join,filtered_join_asof,join_between,multi_source,concat_named, andts_with_columnsexpress joins, multi-source composition, and rolling/lookback time-series logic without blocking the filter pushdown that those operations normally defeat. - Caching —
cachekeeps an in-memory, column- and partition-level cache for iterative work;cache_parquetmaterializes date-partitioned Parquet on local disk or S3, fetching only the partitions a query needs. - Distributed execution —
execute_on_raysplits aLazyFrameby calendar period and runs the partitions across an existing Ray cluster. - Ergonomics —
iter_rowsfor memory-efficient row iteration,debugto inspect what Polars pushes into a source, anddisable_optimizationsto compare against plain Polars.
Documentation
Full documentation lives in the project wiki:
- Getting Started — a guided tutorial.
- Reading and Writing Data — recipes for each I/O source and sink.
- Query Optimization — joins, composition, time-series, and caching.
- Distributed Execution — running on Ray.
- API Reference — the public functions and
piotnamespace. - Concepts — how predicate pushdown into custom sources works.
Contributing
Contributions are welcome. See the Contributing guide and Local Development Setup to get started.
License
polars-io-tools is licensed under the Apache 2.0 license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polars_io_tools-0.1.0.tar.gz.
File metadata
- Download URL: polars_io_tools-0.1.0.tar.gz
- Upload date:
- Size: 309.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0970aac506f0e3d5a8ea23565d336fbaa6c4507caa83b8ef155dc9a293855a0
|
|
| MD5 |
d538bccaf4dc22bae2c438fff89890a0
|
|
| BLAKE2b-256 |
ca9fdc92d329da2ac33e4e474412f0e93d196c5451206facc7d4f6e1e23bcb9b
|
File details
Details for the file polars_io_tools-0.1.0-cp311-abi3-win_amd64.whl.
File metadata
- Download URL: polars_io_tools-0.1.0-cp311-abi3-win_amd64.whl
- Upload date:
- Size: 439.6 kB
- Tags: CPython 3.11+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4509ce820dbd90e1f671a6188f4c34ce8b4d28fb3ac0bc37b7e8d281c18b9a89
|
|
| MD5 |
d0c6085aa7836ef6eb254a8e312786b9
|
|
| BLAKE2b-256 |
dc85cb7f3506a4f09353845e63e18193391f751377e332ff9d6f5744f07fd8bc
|
File details
Details for the file polars_io_tools-0.1.0-cp311-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: polars_io_tools-0.1.0-cp311-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 557.4 kB
- Tags: CPython 3.11+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ac2ebd0949321e0a990a7703afbfe6499b250cd958a787b77321c4b6a61be8e
|
|
| MD5 |
18f8cf73a3acc120e57e66892b90f565
|
|
| BLAKE2b-256 |
f1537d0e61f5820a57d0265fe8887b8b461c7265bb82316502697fa66d6c0af0
|
File details
Details for the file polars_io_tools-0.1.0-cp311-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: polars_io_tools-0.1.0-cp311-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 535.2 kB
- Tags: CPython 3.11+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3e562e124594c6d6c50d9dfa9c481165dc1c711ae20c1b5bfbc73e5743c949b
|
|
| MD5 |
eefb784010cb598af099453005b293bb
|
|
| BLAKE2b-256 |
969a8893cd2e6300f368c973b870c89d7f6094debec1689d4812dbb7c49532a3
|