Facade to collect rows one-by-one into a Polars DataFrame (in the least-bad way)
Project description
polars-row-collector
Facade to collect rows one-by-one into a Polars DataFrame (in the least-bad way)
Getting Started Example
import polars as pl
from polars_row_collector import PolarsRowCollector
collector = PolarsRowCollector(
# Note: Schema is optional, but recommended.
schema={"col1": pl.Int64, "col2": pl.Float64}
)
for item in items:
row = {
"col1": item.value1,
"col2": item.value2,
}
collector.add_row(row)
df = collector.to_df()
You can think of collector as filling the same niche as a list_of_dfs: list[pl.DataFrame].
Features
- Highly performant and memory-optimized.
- Much more-so than collecting into a
list[dict[str, Any]]or concatenating one-row dataframes.
- Much more-so than collecting into a
- Optionally supply a schema for the incoming rows.
- Thread-safe (when GIL is enabled - default in Python <= 3.15).
- Configuration arguments for safety vs. performance tradeoffs:
- Behaviour if there are missing columns: Enforce all columns present or allow missing columns.
- Behaviour if there are extra columns: Drop silently or raise.
- Maintain insertion order.
Example Applications
- Gathering data in a web scraping/parsing tool.
- Gathering/batching incoming log messages or event logs before writing in bulk to some destination.
- Gathering data in a document parsing pipeline (e.g., XML with lots of conditionals).
Future Features
- Intermediate to-disk storage to temporary parquet files to larger-than-memory collections.
- Further optimize appending many rows at once.
- Read the dataframe so-far, in the middle of gathering rows.
- Documentation.
Disclaimer
As the project's description says, this is the "least-bad way" to accomplish this pattern.
If you can implement your code in such a way that you're not collecting individual rows of a dataframe, you are likely better-off doing it that way (e.g., collecting a list[pl.DataFrame]).
However, there are always exceptions to the best practices, and this library is significantly more efficient (performance and memory) than collecting into a list[dict[str, Any]].
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polars_row_collector-0.2.0.tar.gz.
File metadata
- Download URL: polars_row_collector-0.2.0.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.25 {"installer":{"name":"uv","version":"0.9.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc1dc09f02115d6fa202163af692500ba67f7619f2f1f58ad4d9e36fb8e5437c
|
|
| MD5 |
a460eb00287f291cda740e96a11f79d7
|
|
| BLAKE2b-256 |
1a55d12949faf3eef5a8c8a24a2b90d97619cb8012c2ee799d03c47e24c1c478
|
File details
Details for the file polars_row_collector-0.2.0-py3-none-any.whl.
File metadata
- Download URL: polars_row_collector-0.2.0-py3-none-any.whl
- Upload date:
- Size: 6.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.25 {"installer":{"name":"uv","version":"0.9.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
661ce4b78a7016f8c0b8480b3b9387edda36bfe52ad0a4825311adc164dc211f
|
|
| MD5 |
e5aaef799e661a983f25c7014559ad49
|
|
| BLAKE2b-256 |
9ff586d1563d0d27d476c48482ce48dc4a7b9ea88db42b8ec400ae8a78ab7d21
|