Skip to main content

ZooPipe is a data processing framework that allows you to process data in a declarative way.

Project description

ZooPipe

ZooPipe is a lean, ultra-high-performance data processing engine for Python. It leverages a 100% Rust core to handle I/O and orchestration, while keeping the flexibility of Python for schema validation (via Pydantic) and custom data enrichment (via Hooks).


✨ Key Features

  • 🚀 100% Native Rust Engine: The core execution loop, including CSV and JSON parsing/writing, is implemented in Rust for maximum throughput.
  • 🔍 Declarative Validation: Use Pydantic models to define and validate your data structures naturally.
  • 🪝 Python Hooks: Transform and enrich data at any stage using standard Python functions or classes.
  • 🚨 Automated Error Routing: Native support for routing failed records to a dedicated error output.
  • 📊 Multiple Format Support: Optimized readers/writers for CSV, JSONL, and SQL databases.
  • 🔧 Two-Tier Parallelism: Orchestrate across processes or clusters with Engines (Local, Ray), and scale throughput at the node level with Rust Executors.
  • ☁️ Cloud Native: Native S3 support and zero-config distributed execution on Ray clusters.

⚡ Performance & Benchmarks

Why ZooPipe? Because vectorization isn't always the answer.

Tools like Pandas and Polars are incredible for analytical workloads (groupby, sum, joins) where operations can be vectorized in C/Rust. However, real-world Data Engineering often involves "chaotic ETL": messy custom rules, API calls per row, hashing, conditional cleanup, and complex normalization that forcedly drop down to Python loops.

In these "Heavy ETL" scenarios, ZooPipe outperforms Vectorized DataFrames by 3x-8x.

Benchmark: Heavy ETL (15M+ Rows, 10GB CSV)

Scenario: SHA256 Hashing, Normalization, Filtering, Enrichment per row.

System: Macbook Pro M1 2020 (8GB RAM).

Tool Time (s) Speed (Rows/s) Peak RAM (MB)
ZooPipe (4 workers) ~45s ~356k ~85 MB
ZooPipe (1 worker)* ~89s ~180k ~34 MB
Pure Python ~145s ~110k ~25 MB
Pydantic ~180s ~89k ~31 MB
Polars ~370s ~43k ~2500 MB
Pandas ~1830s ~9k ~3400 MB

*ZooPipe (1 worker) ran a lighter workload (timestamp only) validation, used as baseline for raw throughput.

Key Takeaway: ZooPipe's "Python-First Architecture" with parallel streaming (PipeManager) avoids the serialization overhead that cripples Polars/Pandas when using Python UDFs (map_elements/apply), and uses 97% less RAM.


🚀 Quick Start

Installation

pip install zoopipe

Or using uv:

uv add zoopipe

Or from source (uv recommended):

uv build
uv run maturin develop --release

Simple Example

from pydantic import BaseModel, ConfigDict
from zoopipe import CSVInputAdapter, CSVOutputAdapter, Pipe


class UserSchema(BaseModel):
    model_config = ConfigDict(extra="ignore")
    user_id: str
    username: str
    email: str


pipe = Pipe(
    input_adapter=CSVInputAdapter("users.csv"),
    output_adapter=CSVOutputAdapter("processed_users.csv"),
    error_output_adapter=CSVOutputAdapter("errors.csv"),
    schema_model=UserSchema,
)

pipe.start()
pipe.wait()


print(f"Finished! Processed {pipe.report.total_processed} items.")

Automatically split large files or manage multiple independent workflows:

from zoopipe import PipeManager, MultiProcessEngine

# Create your pipe as usual (Pipe is purely declarative)
pipe = Pipe(...)

# Automatically parallelize across 4 workers
# MultiProcessEngine() for local, RayEngine() for clusters
manager = PipeManager.parallelize_pipe(
    pipe, 
    workers=4, 
    engine=MultiProcessEngine() 
)
manager.start()
manager.wait()

📚 Documentation

Core Concepts

Hooks

Hooks are Python classes that allow you to intercept, transform, and enrich data at different stages of the pipeline.

📘 Read the full Hooks Guide to learn about lifecycle methods (setup, execute, teardown), state management, and advanced patterns like cursor pagination.

Quick Example

from zoopipe import BaseHook

class MyHook(BaseHook):
    def execute(self, entries, store):
        for entry in entries:
            entry["raw_data"]["checked"] = True
        return entries

[!IMPORTANT] If you are using a schema_model, the pipeline will output the contents of validated_data for successful records.

  • To modify data before validation, use pre_validation_hooks and modify entry["raw_data"].
  • To modify data after validation (and ensure it reaches the output), use post_validation_hooks and modify entry["validated_data"].

Input/Output Adapters

File Formats

Databases

  • SQL Adapters - Read from and write to SQL databases with batch optimization
  • SQL Pagination - High-performance cursor-style pagination for large tables
  • DuckDB Adapters - Analytical database for OLAP workloads

Messaging Systems

Advanced


🛠 Architecture

ZooPipe is designed as a thin Python wrapper around a powerful Rust core, featuring a two-tier parallel architecture:

  1. Orchestration Tier (Python Engines):
    • Manage distribution across processes or nodes (e.g., MultiProcessEngine).
    • Handles data sharding, process lifecycle, and metrics aggregation.
  2. Execution Tier (Rust BatchExecutors):
    • Internal Throughput: High-speed processing within a single process.
    • Adapters: Native CSV/JSON/SQL Readers and Writers.
    • NativePipe: Orchestrates the loop, fetching chunks and routing result batches.
    • Executors: Multi-threaded Rust strategies to bypass the GIL within a node.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zoopipe-2026.1.18.tar.gz (202.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

zoopipe-2026.1.18-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl (27.5 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ x86-64

zoopipe-2026.1.18-cp310-abi3-win_amd64.whl (21.0 MB view details)

Uploaded CPython 3.10+Windows x86-64

zoopipe-2026.1.18-cp310-abi3-manylinux_2_28_x86_64.whl (27.5 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

zoopipe-2026.1.18-cp310-abi3-manylinux_2_28_aarch64.whl (26.1 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

zoopipe-2026.1.18-cp310-abi3-macosx_11_0_arm64.whl (23.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

zoopipe-2026.1.18-cp310-abi3-macosx_10_12_x86_64.whl (24.3 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file zoopipe-2026.1.18.tar.gz.

File metadata

  • Download URL: zoopipe-2026.1.18.tar.gz
  • Upload date:
  • Size: 202.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zoopipe-2026.1.18.tar.gz
Algorithm Hash digest
SHA256 e8a81e4e3ac40fa6c2d8cb705a9f4e5f2ccd516f6750f8fd124e0e4e2da0a227
MD5 3bbe7be81fafda70476d9369dc86c1c7
BLAKE2b-256 be9fd7256fd1fe591d7cf886d5ed9059c2cf5f468b69d049b9452345352ddb7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.18.tar.gz:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.18-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.18-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 983e0db8f6205a06689de5e8b66bbc708ded478c0ec7469208e251128a44fe5a
MD5 78c10fce278beca0a5bbc0bb063471c8
BLAKE2b-256 d75b45d21508997a5225fd7130776b3773358221a0f1849c5f1ce2a3506fa453

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.18-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.18-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: zoopipe-2026.1.18-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 21.0 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zoopipe-2026.1.18-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 b4b8cbb53b29510d31a19833dfe9a9da22b0a39f3555a47b0a3e91abefdc8232
MD5 55b159ff62b1e2819b636f543ec2b1f8
BLAKE2b-256 ea25eab2a884a91deb610318f25fbaa8a3683f36c8681272b860e79c8e2355ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.18-cp310-abi3-win_amd64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.18-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.18-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 eaafec246ca7b979518b0f689eda372c34a945fa71471810c3bdbb867f33111d
MD5 955a8afee7183c97b5594bf3db3a6658
BLAKE2b-256 77d765300265fc8f41da8bd16a541fb55032361a5fb51772c7f33c7c35f11f18

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.18-cp310-abi3-manylinux_2_28_x86_64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.18-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.18-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0c4bce29e2cb54759ba5483db411c8cd6fd0419f6fee14018c25f0e4823c2214
MD5 cd63fea9a5c2648f19becaaf78780bf5
BLAKE2b-256 4d5198b038d09df82dd59615e65e2fa12931eaafe2244bc6d3930f40d0cffb4e

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.18-cp310-abi3-manylinux_2_28_aarch64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.18-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.18-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 66b47f79bcdf9d5099d45ead7fcd9acee80f1284a634a44fb3db759a0658ce90
MD5 f3078593d6d9d582ab5b305d9fff3288
BLAKE2b-256 f91e7136767188006074d5e606edaad463f9be7790134f22dd02cef5baf56447

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.18-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.18-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.18-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d02d63464bc5671acd88f17ecfa8f24d0dbcb2c5bb65d19a057d999705a49b6d
MD5 d3cd66d87d23c7b43080de9c15225d6f
BLAKE2b-256 d589fd4162493ccebe978debfaec7e3619b73b9d6dfaecc690a43511fe6f675f

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.18-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page