Skip to main content

Facade to collect rows one-by-one into a Polars DataFrame (in the least-bad way)

Project description

polars-row-collector

Facade to collect rows one-by-one into a Polars DataFrame (in the least-bad way)

Getting Started Example

import polars as pl
from polars_row_collector import PolarsRowCollector

collector = PolarsRowCollector(
    # Note: Schema is optional, but recommended.
    schema={"col1": pl.Int64, "col2": pl.Float64}
)

for item in items:
    row = {
        "col1": item.value1,
        "col2": item.value2,
    }
    collector.add_row(row)

df = collector.to_df()

You can think of collector as filling the same niche as a list_of_dfs: list[pl.DataFrame].

Features

  • Highly performant and memory-optimized.
    • Much more-so than collecting into a list[dict[str, Any]] or concatenating one-row dataframes.
  • Optionally supply a schema for the incoming rows.
  • Thread-safe (when GIL is enabled - default in Python <= 3.15).
  • Configuration arguments for safety vs. performance tradeoffs:
    • Behaviour if there are missing columns: Enforce all columns present or allow missing columns.
    • Behaviour if there are extra columns: Drop silently or raise.
    • Maintain insertion order.

Example Applications

  • Gathering data in a web scraping/parsing tool.
  • Gathering/batching incoming log messages or event logs before writing in bulk to some destination.
  • Gathering data in a document parsing pipeline (e.g., XML with lots of conditionals).

Future Features

  • Intermediate to-disk storage to temporary parquet files to larger-than-memory collections.
  • Further optimize appending many rows at once.
  • Read the dataframe so-far, in the middle of gathering rows.
  • Documentation.

Disclaimer

As the project's description says, this is the "least-bad way" to accomplish this pattern.

If you can implement your code in such a way that you're not collecting individual rows of a dataframe, you are likely better-off doing it that way (e.g., collecting a list[pl.DataFrame]).

However, there are always exceptions to the best practices, and this library is significantly more efficient (performance and memory) than collecting into a list[dict[str, Any]].

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_row_collector-0.2.0.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_row_collector-0.2.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file polars_row_collector-0.2.0.tar.gz.

File metadata

  • Download URL: polars_row_collector-0.2.0.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.25 {"installer":{"name":"uv","version":"0.9.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for polars_row_collector-0.2.0.tar.gz
Algorithm Hash digest
SHA256 dc1dc09f02115d6fa202163af692500ba67f7619f2f1f58ad4d9e36fb8e5437c
MD5 a460eb00287f291cda740e96a11f79d7
BLAKE2b-256 1a55d12949faf3eef5a8c8a24a2b90d97619cb8012c2ee799d03c47e24c1c478

See more details on using hashes here.

File details

Details for the file polars_row_collector-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: polars_row_collector-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.25 {"installer":{"name":"uv","version":"0.9.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for polars_row_collector-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 661ce4b78a7016f8c0b8480b3b9387edda36bfe52ad0a4825311adc164dc211f
MD5 e5aaef799e661a983f25c7014559ad49
BLAKE2b-256 9ff586d1563d0d27d476c48482ce48dc4a7b9ea88db42b8ec400ae8a78ab7d21

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page