Skip to main content

Zero-allocation lazy query pipelines for Python, powered by Rust

Project description

ZPyFlow

Alpha - v0.1.x is an early release for testing and feedback.

Lazy query pipelines for Python, powered by Rust.

ZPyFlow is for Python data that is already in memory: lists, NumPy arrays, generators, dict records, dataclasses, and similar objects. It runs lazy pipelines without building intermediate Python lists, and numeric DSL paths execute in a SIMD-accelerated Rust kernel without constructing Python objects per element.

  • Lazy and fused: chained operations run in one pass.
  • Numeric fast paths: float/int/bool arrays execute in Rust.
  • Expression DSL: col > 5 and col * 2 avoid Python callbacks.
  • Object-friendly: lambdas still work for dicts, dataclasses, and models.
  • Parallel option: .parallel() uses Rayon for large numeric workloads.

ZPyFlow is not trying to replace Polars or pandas. Use those when your data is already table-shaped or you need joins, windows, or multi-column analytics. Use ZPyFlow when you just want a fast pipeline inside ordinary Python code.

Installation

pip install zpyflow

For source builds and contributor setup, see docs/contributing.md.

Quick Start

from zpyflow import Query, col

data = [1.5, -2.3, 0.7, 4.1, -0.5, 3.8, -1.1, 2.2]

result = (
    Query(data)
        .filter(col > 0)
        .map(col * 2.0)
        .take(4)
        .to_list()
)

assert result == [3.0, 1.4, 8.2, 7.6]

Aggregations stay inside the Rust numeric path:

positive = Query(data).filter(col > 0)

assert positive.count() == 5
assert positive.sum() == 12.3
assert Query(data).max() == 4.1

Expression DSL vs Lambdas

Use the expression DSL for numeric hot paths:

from zpyflow import Query, col

scores = [0.2, 0.9, 0.4, 0.95, 0.7]

top = (
    Query(scores)
        .filter(col >= 0.7)
        .map(col * 100)
        .to_list()
)

assert top == [90.0, 95.0, 70.0]

Use lambdas when you need arbitrary Python logic:

records = [
    {"name": "alice", "score": 91},
    {"name": "bob", "score": 64},
    {"name": "carol", "score": 88},
]

names = (
    Query(records)
        .filter(lambda r: r["score"] >= 80)
        .map(lambda r: r["name"])
        .to_list()
)

assert names == ["alice", "carol"]

Lambdas are flexible, but they run as Python callbacks. For speed, prefer col or field() expressions where they fit.

NumPy

import numpy as np
from zpyflow import from_numpy, col

arr = np.random.default_rng(42).standard_normal(1_000_000)

count = (
    from_numpy(arr)
        .filter(col > 0)
        .take(10_000)
        .count()
)

from_numpy() supports common 1-D numeric dtypes, including float64, float32, int64, bool, and uint8. See the full NumPy integration guide.

Dict Records and Field DSL

For dict-like records, field() covers common filters and aggregations without writing a Python callback for every element:

from zpyflow import Query, field

logs = [
    {"path": "/api", "status": 200, "latency_ms": 42.0},
    {"path": "/api", "status": 500, "latency_ms": 310.0},
    {"path": "/health", "status": 200, "latency_ms": 8.0},
]

slow = (
    Query(logs)
        .filter(field("latency_ms") > 100)
        .to_list()
)

assert slow == [{"path": "/api", "status": 500, "latency_ms": 310.0}]

See Object Field DSL and Adapters for JSON Lines, CSV, and Arrow input.

Grouping

from zpyflow import Query, agg_count, agg_sum, field

orders = [
    {"user": "alice", "amount": 120.0},
    {"user": "bob", "amount": 45.0},
    {"user": "alice", "amount": 80.0},
]

summary = Query(orders).group_agg(
    field("user"),
    count=agg_count(),
    total=agg_sum(field("amount")),
)

assert summary == [
    {"_key": "alice", "count": 2, "total": 200.0},
    {"_key": "bob", "count": 1, "total": 45.0},
]

See GroupBy & group_agg for more grouping examples.

When To Use It

Good fits:

  • Data is already in Python sequences or NumPy arrays.
  • Early stopping matters, for example filter(col > threshold).take(k).
  • You want a lazy pipeline API around ordinary Python data.
  • You can use col or field() instead of a Python callback.
  • You want Arrow or NumPy inputs to stay on a typed path.
  • You have a sync web endpoint (FastAPI, Flask) filtering or aggregating a large numeric sequence — ZPyFlow's lower per-request CPU time translates directly to higher RPS under concurrent load.

Use another tool when:

  • You need joins, window functions, or SQL-style analytics.
  • The work spans many columns in a table.
  • The main operation is dense vectorized math that NumPy already expresses well.
  • You need full-array filter, map, sum, or count and NumPy already has the data.
  • Your data is small enough that readability matters more than execution path.

More detail: Performance Guide, Benchmark Results, and When Is It Actually Fast?.

API Overview

Common imports:

from zpyflow import (
    Query, col, field,
    from_numpy, from_arrow, from_csv, from_json_lines, from_generator,
    agg_count, agg_sum, agg_mean, agg_max, agg_min,
)

Core pipeline methods:

q.filter(pred).map(func).skip(n).take(n).parallel()
q.to_list()
q.count()
q.sum()
q.min()
q.max()
q.first()
q.last()
q.stats()

See site/docs/api.md for the complete API reference.

Documentation

Examples

Runnable examples are in examples/:

  • 01_basic_numeric.py
  • 02_numpy_integration.py
  • 03_pandas_integration.py
  • 04_dataclasses.py
  • 05_log_processing.py
  • 06_etl_pipeline.py
  • 07_ai_embeddings.py
  • 08_parallel_and_performance.py

See examples/README.md for the full list.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zpyflow-0.1.1.tar.gz (293.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

zpyflow-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (756.2 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

zpyflow-0.1.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (710.2 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

zpyflow-0.1.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.3 MB view details)

Uploaded CPython 3.10+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file zpyflow-0.1.1.tar.gz.

File metadata

  • Download URL: zpyflow-0.1.1.tar.gz
  • Upload date:
  • Size: 293.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.13.3

File hashes

Hashes for zpyflow-0.1.1.tar.gz
Algorithm Hash digest
SHA256 779f47115c4500cad039b5f50baf3f996023aed3b7d4c857a41a9d8ad264555f
MD5 602a1c9765da3d05216a2545296eefde
BLAKE2b-256 d953c20fb0367ea3d46e4692f8a5bde410899be41aebd0a35b50aabdd7e831e2

See more details on using hashes here.

File details

Details for the file zpyflow-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for zpyflow-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 62068c95334e527afccfa7805f35d0793d62a46ff223599e420a9fa2cbd9dc07
MD5 9754cfb940376fe649ffa3be7ecf2cfd
BLAKE2b-256 7adaebe2b389fdeacbbd1cce7a1adc1e86a634423ca290c3b0843d3c1c8649cf

See more details on using hashes here.

File details

Details for the file zpyflow-0.1.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for zpyflow-0.1.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 fcb0809c264de31b157fa08a3bdbdc74e1281495c701f1cbacb1edd5cee8d2e5
MD5 2f0fd3c75a844ce30adb71cd186f1212
BLAKE2b-256 1b57e393c08ae936099c1b2cc9efdcf58ce40015c962d08c877acfdf9ff6534f

See more details on using hashes here.

File details

Details for the file zpyflow-0.1.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for zpyflow-0.1.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 46a9864e2a5919c86046b52edd84e3f5f57d2357ad3357f2f7e08d04bcffa64a
MD5 c40f981c19f0a8fa4ac2f86ba0381ce1
BLAKE2b-256 d4a4f4e95b4f28797ca33e0ce9ec9cb90508e499bfa5a3f7cb91d2cf74e81df1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page