Skip to main content

Custom parsing extensions for lazy polars

Project description

polars io tools

Custom parsing extensions for lazy polars

Build Status codecov License PyPI

Overview

polars-io-tools extends Polars lazy execution with custom I/O sources that push filters and column projections all the way down into the systems that hold your data — SQL databases, ClickHouse, Datadog, and Delta Lake — instead of loading everything and filtering in memory. It also adds lazy-friendly operations (joins, multi-source composition, time-series windows, caching, distributed execution) that keep predicate pushdown working where vanilla Polars would otherwise give up and materialize the whole frame.

Everything is exposed through the piot LazyFrame namespace and a handful of top-level scan_* / sink_* functions, so it composes naturally with the Polars API you already use.

Who is this for

Reach for polars-io-tools when you want Polars' lazy API over data that lives in an external store, and you care about not fetching rows or columns you will immediately throw away. It is most valuable for large, partitioned, or remote datasets where a filter on a date or key column should translate into a smaller query against the source. If your data already fits comfortably in memory or lives in local Parquet/CSV, plain Polars is the simpler choice.

Installation

pip install polars-io-tools

polars-io-tools requires Python 3.11 or newer. See the Installation guide for conda and source builds.

Quickstart

Importing the package registers the piot namespace on every Polars LazyFrame:

import polars as pl
import polars_io_tools  # registers the .piot namespace

left = pl.LazyFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
right = pl.LazyFrame({"x": [-1, -2, 3], "z": [7, 8, 9]})

# An inner join where the keys present on the left are pushed down as a
# filter on the right frame *before* the join runs.
result = left.piot.filtered_join(right, on="x").collect()
print(result)
# shape: (1, 3)
# ┌─────┬─────┬─────┐
# │ x   ┆ y   ┆ z   │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ 3   ┆ 6   ┆ 9   │
# └─────┴─────┴─────┘

For a guided walkthrough, start with the Getting Started tutorial.

What's included

  • Lazy I/O sources with predicate & projection pushdownscan_db (any ODBC database), scan_clickhouse, scan_datadog, scan_delta, and from_narwhals. Filters on the resulting LazyFrame are translated into the source's own query language (SQL WHERE, Datadog time ranges, Delta partition pruning) so only the matching rows and columns are fetched.
  • Lazy writerssink_delta and sink_clickhouse write a LazyFrame directly to Delta Lake or ClickHouse, including streaming/chunked writes and transparent handling of types the target store cannot represent natively.
  • Pushdown-preserving query buildingfiltered_join, filtered_join_asof, join_between, multi_source, concat_named, and ts_with_columns express joins, multi-source composition, and rolling/lookback time-series logic without blocking the filter pushdown that those operations normally defeat.
  • Cachingcache keeps an in-memory, column- and partition-level cache for iterative work; cache_parquet materializes date-partitioned Parquet on local disk or S3, fetching only the partitions a query needs.
  • Distributed executionexecute_on_ray splits a LazyFrame by calendar period and runs the partitions across an existing Ray cluster.
  • Ergonomicsiter_rows for memory-efficient row iteration, debug to inspect what Polars pushes into a source, and disable_optimizations to compare against plain Polars.

Documentation

Full documentation lives in the project wiki:

Contributing

Contributions are welcome. See the Contributing guide and Local Development Setup to get started.

License

polars-io-tools is licensed under the Apache 2.0 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_io_tools-0.1.0.tar.gz (309.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_io_tools-0.1.0-cp311-abi3-win_amd64.whl (439.6 kB view details)

Uploaded CPython 3.11+Windows x86-64

polars_io_tools-0.1.0-cp311-abi3-manylinux_2_28_x86_64.whl (557.4 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ x86-64

polars_io_tools-0.1.0-cp311-abi3-macosx_11_0_arm64.whl (535.2 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file polars_io_tools-0.1.0.tar.gz.

File metadata

  • Download URL: polars_io_tools-0.1.0.tar.gz
  • Upload date:
  • Size: 309.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for polars_io_tools-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d0970aac506f0e3d5a8ea23565d336fbaa6c4507caa83b8ef155dc9a293855a0
MD5 d538bccaf4dc22bae2c438fff89890a0
BLAKE2b-256 ca9fdc92d329da2ac33e4e474412f0e93d196c5451206facc7d4f6e1e23bcb9b

See more details on using hashes here.

File details

Details for the file polars_io_tools-0.1.0-cp311-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_io_tools-0.1.0-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 4509ce820dbd90e1f671a6188f4c34ce8b4d28fb3ac0bc37b7e8d281c18b9a89
MD5 d0c6085aa7836ef6eb254a8e312786b9
BLAKE2b-256 dc85cb7f3506a4f09353845e63e18193391f751377e332ff9d6f5744f07fd8bc

See more details on using hashes here.

File details

Details for the file polars_io_tools-0.1.0-cp311-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for polars_io_tools-0.1.0-cp311-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4ac2ebd0949321e0a990a7703afbfe6499b250cd958a787b77321c4b6a61be8e
MD5 18f8cf73a3acc120e57e66892b90f565
BLAKE2b-256 f1537d0e61f5820a57d0265fe8887b8b461c7265bb82316502697fa66d6c0af0

See more details on using hashes here.

File details

Details for the file polars_io_tools-0.1.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_io_tools-0.1.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c3e562e124594c6d6c50d9dfa9c481165dc1c711ae20c1b5bfbc73e5743c949b
MD5 eefb784010cb598af099453005b293bb
BLAKE2b-256 969a8893cd2e6300f368c973b870c89d7f6094debec1689d4812dbb7c49532a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page