Zero-allocation lazy query pipelines for Python, powered by Rust
Project description
ZPyFlow
Alpha - v0.1.x is an early release for testing and feedback.
Lazy query pipelines for Python, powered by Rust.
ZPyFlow is for Python data that is already in memory: lists, NumPy arrays, generators, dict records, dataclasses, and similar objects. It runs lazy pipelines without building intermediate Python lists, and numeric DSL paths execute in a SIMD-accelerated Rust kernel without constructing Python objects per element.
- Lazy and fused: chained operations run in one pass.
- Numeric fast paths: float/int/bool arrays execute in Rust.
- Expression DSL:
col > 5andcol * 2avoid Python callbacks. - Object-friendly: lambdas still work for dicts, dataclasses, and models.
- Parallel option:
.parallel()uses Rayon for large numeric workloads.
ZPyFlow is not trying to replace Polars or pandas. Use those when your data is already table-shaped or you need joins, windows, or multi-column analytics. Use ZPyFlow when you just want a fast pipeline inside ordinary Python code.
Installation
pip install zpyflow
For source builds and contributor setup, see docs/contributing.md.
Quick Start
from zpyflow import Query, col
data = [1.5, -2.3, 0.7, 4.1, -0.5, 3.8, -1.1, 2.2]
result = (
Query(data)
.filter(col > 0)
.map(col * 2.0)
.take(4)
.to_list()
)
assert result == [3.0, 1.4, 8.2, 7.6]
Aggregations stay inside the Rust numeric path:
positive = Query(data).filter(col > 0)
assert positive.count() == 5
assert positive.sum() == 12.3
assert Query(data).max() == 4.1
Expression DSL vs Lambdas
Use the expression DSL for numeric hot paths:
from zpyflow import Query, col
scores = [0.2, 0.9, 0.4, 0.95, 0.7]
top = (
Query(scores)
.filter(col >= 0.7)
.map(col * 100)
.to_list()
)
assert top == [90.0, 95.0, 70.0]
Use lambdas when you need arbitrary Python logic:
records = [
{"name": "alice", "score": 91},
{"name": "bob", "score": 64},
{"name": "carol", "score": 88},
]
names = (
Query(records)
.filter(lambda r: r["score"] >= 80)
.map(lambda r: r["name"])
.to_list()
)
assert names == ["alice", "carol"]
Lambdas are flexible, but they run as Python callbacks. For speed, prefer col
or field() expressions where they fit.
NumPy
import numpy as np
from zpyflow import from_numpy, col
arr = np.random.default_rng(42).standard_normal(1_000_000)
count = (
from_numpy(arr)
.filter(col > 0)
.take(10_000)
.count()
)
from_numpy() supports common 1-D numeric dtypes, including float64,
float32, int64, bool, and uint8. See the full
NumPy integration guide.
Dict Records and Field DSL
For dict-like records, field() covers common filters and aggregations without
writing a Python callback for every element:
from zpyflow import Query, field
logs = [
{"path": "/api", "status": 200, "latency_ms": 42.0},
{"path": "/api", "status": 500, "latency_ms": 310.0},
{"path": "/health", "status": 200, "latency_ms": 8.0},
]
slow = (
Query(logs)
.filter(field("latency_ms") > 100)
.to_list()
)
assert slow == [{"path": "/api", "status": 500, "latency_ms": 310.0}]
See Object Field DSL and Adapters for JSON Lines, CSV, and Arrow input.
Grouping
from zpyflow import Query, agg_count, agg_sum, field
orders = [
{"user": "alice", "amount": 120.0},
{"user": "bob", "amount": 45.0},
{"user": "alice", "amount": 80.0},
]
summary = Query(orders).group_agg(
field("user"),
count=agg_count(),
total=agg_sum(field("amount")),
)
assert summary == [
{"_key": "alice", "count": 2, "total": 200.0},
{"_key": "bob", "count": 1, "total": 45.0},
]
See GroupBy & group_agg for more grouping examples.
When To Use It
Good fits:
- Data is already in Python sequences or NumPy arrays.
- Early stopping matters, for example
filter(col > threshold).take(k). - You want a lazy pipeline API around ordinary Python data.
- You can use
colorfield()instead of a Python callback. - You want Arrow or NumPy inputs to stay on a typed path.
- You have a sync web endpoint (FastAPI, Flask) filtering or aggregating a large numeric sequence — ZPyFlow's lower per-request CPU time translates directly to higher RPS under concurrent load.
Use another tool when:
- You need joins, window functions, or SQL-style analytics.
- The work spans many columns in a table.
- The main operation is dense vectorized math that NumPy already expresses well.
- You need full-array
filter,map,sum, orcountand NumPy already has the data. - Your data is small enough that readability matters more than execution path.
More detail: Performance Guide, Benchmark Results, and When Is It Actually Fast?.
API Overview
Common imports:
from zpyflow import (
Query, col, field,
from_numpy, from_arrow, from_csv, from_json_lines, from_generator,
agg_count, agg_sum, agg_mean, agg_max, agg_min,
)
Core pipeline methods:
q.filter(pred).map(func).skip(n).take(n).parallel()
q.to_list()
q.count()
q.sum()
q.min()
q.max()
q.first()
q.last()
q.stats()
See site/docs/api.md for the complete API reference.
Documentation
- API Reference
- Examples
- Performance Guide
- Benchmark Results
- When Is It Actually Fast?
- Design Notes
- Architecture
- Contributing
Examples
Runnable examples are in examples/:
01_basic_numeric.py02_numpy_integration.py03_pandas_integration.py04_dataclasses.py05_log_processing.py06_etl_pipeline.py07_ai_embeddings.py08_parallel_and_performance.py
See examples/README.md for the full list.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zpyflow-0.1.1.tar.gz.
File metadata
- Download URL: zpyflow-0.1.1.tar.gz
- Upload date:
- Size: 293.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
779f47115c4500cad039b5f50baf3f996023aed3b7d4c857a41a9d8ad264555f
|
|
| MD5 |
602a1c9765da3d05216a2545296eefde
|
|
| BLAKE2b-256 |
d953c20fb0367ea3d46e4692f8a5bde410899be41aebd0a35b50aabdd7e831e2
|
File details
Details for the file zpyflow-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: zpyflow-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 756.2 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62068c95334e527afccfa7805f35d0793d62a46ff223599e420a9fa2cbd9dc07
|
|
| MD5 |
9754cfb940376fe649ffa3be7ecf2cfd
|
|
| BLAKE2b-256 |
7adaebe2b389fdeacbbd1cce7a1adc1e86a634423ca290c3b0843d3c1c8649cf
|
File details
Details for the file zpyflow-0.1.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: zpyflow-0.1.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 710.2 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fcb0809c264de31b157fa08a3bdbdc74e1281495c701f1cbacb1edd5cee8d2e5
|
|
| MD5 |
2f0fd3c75a844ce30adb71cd186f1212
|
|
| BLAKE2b-256 |
1b57e393c08ae936099c1b2cc9efdcf58ce40015c962d08c877acfdf9ff6534f
|
File details
Details for the file zpyflow-0.1.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: zpyflow-0.1.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.10+, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46a9864e2a5919c86046b52edd84e3f5f57d2357ad3357f2f7e08d04bcffa64a
|
|
| MD5 |
c40f981c19f0a8fa4ac2f86ba0381ce1
|
|
| BLAKE2b-256 |
d4a4f4e95b4f28797ca33e0ce9ec9cb90508e499bfa5a3f7cb91d2cf74e81df1
|