Facade to collect rows one-by-one into a Polars DataFrame (in the least-bad way)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

DeflateAwning

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
Operating System
- OS Independent
Programming Language
Typing
- Typed

Project description

polars-row-collector

PyPI Python License Docs Built for Polars

Facade to collect rows one-by-one into a Polars DataFrame (in the least-bad way)

Docs: https://DeflateAwning.github.io/polars-row-collector
GitHub: https://github.com/DeflateAwning/polars-row-collector
PyPI: https://pypi.org/project/polars-row-collector/

Getting Started Example

Add the library to your dependencies: uv add polars_row_collector

import polars as pl
from polars_row_collector import PolarsRowCollector

collector = PolarsRowCollector(
    # Note: Schema is optional, but recommended.
    schema={"col1": pl.Int64, "col2": pl.Float64}
)

for item in items:
    row = {
        "col1": item.value1,
        "col2": item.value2,
    }
    collector.add_row(row)

df = collector.to_df()

You can think of collector as filling the same niche as the following alternatives: * list_of_dfs: list[pl.DataFrame] * list_of_dicts: list[dict[str, Any]], then pl.from_dicts(list_of_dicts)

Features

Highly performant and memory-optimized.
- 93% lower memory usage compares to a list-of-dicts approach.
Optionally supply a schema for the incoming rows.
Thread-safe (when GIL is enabled - default in Python <= 3.15).
Configuration arguments for safety vs. performance tradeoffs:
- Behaviour if there are missing columns: Enforce all columns present or allow missing columns.
- Behaviour if there are extra columns: Drop silently or raise.
- Maintain insertion order.

Example Applications

Gathering data in a web scraping/parsing tool.
Gathering/batching incoming log messages or event logs before writing in bulk to some destination.
Gathering data in a markup/document parsing pipeline (e.g., XML with lots of conditionals).

Benchmarks

Benchmark: Collecting 50M rows. Each row has 3 columns.
- Average Speed: 0.42µs/row for both (consistent).
  - Conclusion: No additional elapsed runtime overhead.
- Peak memory usage: 93% decrease compared to a naive implementation.
  - Baseline (list-of-dicts): 26,011.93 MiB
  - PolarsRowCollector: 1,860.16 MiB

Baseline (list-of-dicts)

> COLLECT_MODE=dicts uv run perf_scripts/perf_test_script.py

Collected DataFrame. Current RSS: 26,011.93 MiB | Peak RSS: 26,011.93 MiB
Final overall time per row: 0.42µs/row

`PolarsRowCollector`

> COLLECT_MODE=prc uv run perf_scripts/perf_test_script.py

Collected DataFrame. Current RSS: 1,860.16 MiB | Peak RSS: 1,860.16 MiB
Final overall time per row: 0.42µs/row

Future Features

Intermediate to-disk storage to temporary parquet files to larger-than-memory collections.
Further optimize appending many rows at once.
Read the dataframe so-far, in the middle of gathering rows.
Documentation.

Disclaimer

As the project's description says, this is the "least-bad way" to accomplish this pattern.

If you can implement your code in such a way that you're not collecting individual rows of a dataframe, you are likely better-off doing it that way (e.g., collecting a list[pl.DataFrame]).

However, there are always exceptions to the best practices. In those cases, this library is an ideal choice, and is significantly more memory-efficient than collecting into a list[dict[str, Any]] then converting to a DataFrame later.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

DeflateAwning

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
Operating System
- OS Independent
Programming Language
Typing
- Typed

Release history Release notifications | RSS feed

0.3.0

Feb 3, 2026

This version

0.2.3

Feb 3, 2026

0.2.2

Feb 2, 2026

0.2.1

Jan 26, 2026

0.2.0

Jan 26, 2026

0.1.0

Jan 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_row_collector-0.2.3.tar.gz (23.9 kB view details)

Uploaded Feb 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

polars_row_collector-0.2.3-py3-none-any.whl (8.1 kB view details)

Uploaded Feb 3, 2026 Python 3

File details

Details for the file polars_row_collector-0.2.3.tar.gz.

File metadata

Download URL: polars_row_collector-0.2.3.tar.gz
Upload date: Feb 3, 2026
Size: 23.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.25 {"installer":{"name":"uv","version":"0.9.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for polars_row_collector-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`0c36e39b56d53dd3d4565867dff1ac8b135a303ca5496bdbfe898c9fb39ab79d`
MD5	`bd56c0326017927eb23027de6fd5a1c0`
BLAKE2b-256	`86bf6c2febc2d5fe22404f49b2493e11f87537fab58d4e116e28be15e2cff9ef`

See more details on using hashes here.

File details

Details for the file polars_row_collector-0.2.3-py3-none-any.whl.

File metadata

Download URL: polars_row_collector-0.2.3-py3-none-any.whl
Upload date: Feb 3, 2026
Size: 8.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.25 {"installer":{"name":"uv","version":"0.9.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for polars_row_collector-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2e9806dbadfa2d809c5fe300ecbaabb58ac7feed28940203699b439e861f045d`
MD5	`1dd7dacc768b9ef9be35d67584f73724`
BLAKE2b-256	`757a524673f0c538b0ad18ad836f4e1f24990107d398aa189566250ec28b6b24`

See more details on using hashes here.

polars-row-collector 0.2.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

polars-row-collector

Getting Started Example

Features

Example Applications

Benchmarks

Baseline (list-of-dicts)

`PolarsRowCollector`

Future Features

Disclaimer

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes