Skip to main content

ZooPipe is a data processing framework that allows you to process data in a declarative way.

Project description

ZooPipe Logo

ZooPipe is a lean, ultra-high-performance data processing engine for Python. It leverages a 100% Rust core to handle I/O and orchestration, while keeping the flexibility of Python for schema validation (via Pydantic) and custom data enrichment (via Hooks).

Python 3.10+ License: MIT PyPI Downloads CI ReadTheDocs


Read the docs for more information.

✨ Key Features

  • 🚀 100% Native Rust Engine: The core execution loop, including CSV and JSON parsing/writing, is implemented in Rust for maximum throughput.
  • 🔍 Declarative Validation: Use Pydantic models to define and validate your data structures naturally.
  • 🪝 Python Hooks: Transform and enrich data at any stage using standard Python functions or classes.
  • 🚨 Automated Error Routing: Native support for routing failed records to a dedicated error output.
  • 📊 Multiple Format Support: Optimized readers/writers for CSV, JSONL, and SQL databases.
  • 🔧 Two-Tier Parallelism: Orchestrate across processes or clusters with Engines (Local, Ray, Dask), and scale throughput at the node level with Rust Executors.
  • ☁️ Cloud Native: Native S3, GCS, and Azure support, plus zero-config distributed execution on Ray or Dask clusters.

⚡ Performance & Benchmarks

Why ZooPipe? Because vectorization isn't always the answer.

Tools like Pandas and Polars are incredible for analytical workloads (groupby, sum, joins) where operations can be vectorized in C/Rust. However, real-world Data Engineering often involves "chaotic ETL": messy custom rules, API calls per row, hashing, conditional cleanup, and complex normalization that forcedly drop down to Python loops.

In these "Heavy ETL" scenarios, ZooPipe outperforms Vectorized DataFrames by 3x-8x.

Benchmark Chart

Key Takeaway: ZooPipe's "Python-First Architecture" with parallel streaming (PipeManager) avoids the serialization overhead that cripples Polars/Pandas when using Python UDFs (map_elements/apply), and uses 97% less RAM.

⚖️ Is this unfair to Pandas/Polars?

Yes and No.

  • Unfair: If your workload is purely analytical (e.g., GROUP BY, SUM, JOIN), Polars and Pandas will likely destroy ZooPipe because they can use vectorized C/Rust operations on whole columns at once.
  • Fair: In real-world Data Engineering, many pipelines are "chaotic". They require custom hashing, API calls per row, conditional normalization, or complex Pydantic validation. In these "Python-UDF heavy" scenarios, vectorization breaks down, and ZooPipe shines by orchestrating parallel Python execution efficiently without the DataFrame overhead.

❓ When to use what?

Use ZooPipe When... Use Pandas / Polars When...
🏗️ You have complex, custom Python logic per row (hash, clean, validate). 🧮 You are doing aggregations (SUM, AVG) or Relational Algebra (JOIN, GROUP BY).
🔄 You are processing streaming data or files larger than RAM. 💾 Your dataset fits comfortably in RAM (or use LazyFrames).
🛡️ You need strict schema validation (Pydantic) and error handling. 🔬 You are doing data exploration or statistical analysis.
🚀 You want to mix Rust I/O performance with Python flexibility. ⚡ Your entire pipeline can be expressed in vectorized expressions.

🚀 Quick Start

Installation

pip install zoopipe

Or using uv:

uv add zoopipe

Or from source (uv recommended):

uv build
uv run maturin develop --release

Simple Example

from pydantic import BaseModel, ConfigDict
from zoopipe import CSVInputAdapter, CSVOutputAdapter, Pipe


class UserSchema(BaseModel):
    model_config = ConfigDict(extra="ignore")
    user_id: str
    username: str
    email: str


pipe = Pipe(
    input_adapter=CSVInputAdapter("users.csv"),
    output_adapter=CSVOutputAdapter("processed_users.csv"),
    error_output_adapter=CSVOutputAdapter("errors.csv"),
    schema_model=UserSchema,
)

pipe.start()
pipe.wait()


print(f"Finished! Processed {pipe.report.total_processed} items.")

Automatically split large files or manage multiple independent workflows:

from zoopipe import PipeManager, MultiProcessEngine

# Create your pipe as usual (Pipe is purely declarative)
pipe = Pipe(...)

# Automatically parallelize across 4 workers
# MultiProcessEngine() for local, RayEngine() or DaskEngine() for clusters
manager = PipeManager.parallelize_pipe(
    pipe, 
    workers=4, 
    engine=MultiProcessEngine() 
)
manager.start()
manager.wait()

📚 Documentation

Core Concepts

Hooks

Hooks are Python classes that allow you to intercept, transform, and enrich data at different stages of the pipeline.

📘 Read the full Hooks Guide to learn about lifecycle methods (setup, execute, teardown), state management, and advanced patterns like cursor pagination.

Quick Example

from zoopipe import BaseHook

class MyHook(BaseHook):
    def execute(self, entries, store):
        for entry in entries:
            entry["raw_data"]["checked"] = True
        return entries

[!IMPORTANT] If you are using a schema_model, the pipeline will output the contents of validated_data for successful records.

  • To modify data before validation, use pre_validation_hooks and modify entry["raw_data"].
  • To modify data after validation (and ensure it reaches the output), use post_validation_hooks and modify entry["validated_data"].

Executors

Executors control how ZooPipe scales up within a single node using Rust-managed threads. They are the engine under the hood that drives high throughput.

📘 Read the full Executors Guide to understand the difference between SingleThreadExecutor (debug/ordered) and MultiThreadExecutor (high-throughput).

Input/Output Adapters

File Formats

Databases

  • SQL Adapters - Read from and write to SQL databases with batch optimization
  • SQL Pagination - High-performance cursor-style pagination for large tables
  • DuckDB Adapters - Analytical database for OLAP workloads

Messaging Systems

Advanced


🛠 Architecture

ZooPipe is designed as a thin Python wrapper around a powerful Rust core, featuring a two-tier parallel architecture:

  1. Orchestration Tier (Python Engines):
    • Manage distribution across processes or nodes (e.g., MultiProcessEngine).
    • Handles data sharding, process lifecycle, and metrics aggregation.
  2. Execution Tier (Rust BatchExecutors):
    • Internal Throughput: High-speed processing within a single process.
    • Adapters: Native CSV/JSON/SQL Readers and Writers.
    • NativePipe: Orchestrates the loop, fetching chunks and routing result batches.
    • Executors: Multi-threaded Rust strategies to bypass the GIL within a node.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zoopipe-2026.1.27.tar.gz (234.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

zoopipe-2026.1.27-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl (22.4 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ x86-64

zoopipe-2026.1.27-cp310-abi3-win_amd64.whl (17.8 MB view details)

Uploaded CPython 3.10+Windows x86-64

zoopipe-2026.1.27-cp310-abi3-manylinux_2_28_x86_64.whl (22.4 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

zoopipe-2026.1.27-cp310-abi3-manylinux_2_28_aarch64.whl (19.6 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

zoopipe-2026.1.27-cp310-abi3-macosx_11_0_arm64.whl (17.6 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

zoopipe-2026.1.27-cp310-abi3-macosx_10_12_x86_64.whl (19.5 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file zoopipe-2026.1.27.tar.gz.

File metadata

  • Download URL: zoopipe-2026.1.27.tar.gz
  • Upload date:
  • Size: 234.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zoopipe-2026.1.27.tar.gz
Algorithm Hash digest
SHA256 66f172b4ac43905790dc24bc74a3bf5386c49c57ed1241bbb0fb581df454c7ed
MD5 06c378193ad156a1cbd5fcb8006a8fd3
BLAKE2b-256 a3d888d1d45f22f9939b840eedebac2c9517af305dc14ad27ce18d219b4335b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.27.tar.gz:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.27-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.27-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9bec647d9d6c66c123c4c00f8a9f5b3f4eddeaf9e904ed1b66a69a01314d684f
MD5 d77aa2a7e0b5bd9bc42226c8f9ef990b
BLAKE2b-256 43b9c3664ae7d671db85a21853f2deb2afb98a1d844fed4d998a413377e7c2e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.27-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.27-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: zoopipe-2026.1.27-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 17.8 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zoopipe-2026.1.27-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 6f5e4ea2cf45078b8967e1c6598441a3ba0dc13d8c3533dbfc0d4f8de8fee4f5
MD5 ead15dcbae02f8af5e06b46f9d1e770b
BLAKE2b-256 92fdf642a49a13fc5bbff5cf163f0b805111477962de13ed39b4708e36835f09

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.27-cp310-abi3-win_amd64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.27-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.27-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9d49cf3b21b44a97c5ced554853d61c2ed4a48ada60440d054a29aa6aa5f2e05
MD5 64c1bb053a146bcb640ded8df79878a1
BLAKE2b-256 5047346b71224394926ba01221fd7098c5567d98f723c2e36d509da29b1ec0f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.27-cp310-abi3-manylinux_2_28_x86_64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.27-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.27-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 cf0cba496c352a94c62500ef0c517258cbbc5abe92c751172fc3d02e5f31ef40
MD5 1afb767608c354c7ab358d71883c1885
BLAKE2b-256 4831fd8ec155ef6ba209254fe2c5349538f185ca3ff29622341ef3e19286bed0

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.27-cp310-abi3-manylinux_2_28_aarch64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.27-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.27-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4bd4fd6d22984062666b5e56b0a3d9bf76be183b006545248919bf63d8152b40
MD5 cbd3cb838d018d6113135251078564a3
BLAKE2b-256 776672b48ac8a8673466cd9a75ea2dc35fa26a1a4ee2712feae02666a3fea2d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.27-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.27-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.27-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9dc72e8a15beb4140dc676beb34a3337266312adc92d7fd0c680674d325ef42e
MD5 9e5d8169796f10206b4536dc7e97b242
BLAKE2b-256 27c005a4ba7fb8fdcdbcc38e0d4a34d3ee539835e530f8833e598ed701433be3

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.27-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page