Skip to main content

ZooPipe is a data processing framework that allows you to process data in a declarative way.

Project description

ZooPipe Logo

ZooPipe is a lean, ultra-high-performance data processing engine for Python. It leverages a 100% Rust core to handle I/O and orchestration, while keeping the flexibility of Python for schema validation (via Pydantic) and custom data enrichment (via Hooks).

Python 3.10+ License: MIT PyPI Downloads CI ReadTheDocs


Read the docs for more information.

✨ Key Features

  • 🚀 100% Native Rust Engine: The core execution loop, including CSV and JSON parsing/writing, is implemented in Rust for maximum throughput.
  • 🔍 Declarative Validation: Use Pydantic models to define and validate your data structures naturally.
  • 🪝 Python Hooks: Transform and enrich data at any stage using standard Python functions or classes.
  • 🚨 Automated Error Routing: Native support for routing failed records to a dedicated error output.
  • 📊 Multiple Format Support: Optimized readers/writers for CSV, JSONL, and SQL databases.
  • 🔧 Two-Tier Parallelism: Orchestrate across processes or clusters with Engines (Local, Ray), and scale throughput at the node level with Rust Executors.
  • ☁️ Cloud Native: Native S3 support and zero-config distributed execution on Ray clusters.

⚡ Performance & Benchmarks

Why ZooPipe? Because vectorization isn't always the answer.

Tools like Pandas and Polars are incredible for analytical workloads (groupby, sum, joins) where operations can be vectorized in C/Rust. However, real-world Data Engineering often involves "chaotic ETL": messy custom rules, API calls per row, hashing, conditional cleanup, and complex normalization that forcedly drop down to Python loops.

In these "Heavy ETL" scenarios, ZooPipe outperforms Vectorized DataFrames by 3x-8x.

Benchmark Chart

Key Takeaway: ZooPipe's "Python-First Architecture" with parallel streaming (PipeManager) avoids the serialization overhead that cripples Polars/Pandas when using Python UDFs (map_elements/apply), and uses 97% less RAM.

⚖️ Is this unfair to Pandas/Polars?

Yes and No.

  • Unfair: If your workload is purely analytical (e.g., GROUP BY, SUM, JOIN), Polars and Pandas will likely destroy ZooPipe because they can use vectorized C/Rust operations on whole columns at once.
  • Fair: In real-world Data Engineering, many pipelines are "chaotic". They require custom hashing, API calls per row, conditional normalization, or complex Pydantic validation. In these "Python-UDF heavy" scenarios, vectorization breaks down, and ZooPipe shines by orchestrating parallel Python execution efficiently without the DataFrame overhead.

❓ When to use what?

Use ZooPipe When... Use Pandas / Polars When...
🏗️ You have complex, custom Python logic per row (hash, clean, validate). 🧮 You are doing aggregations (SUM, AVG) or Relational Algebra (JOIN, GROUP BY).
🔄 You are processing streaming data or files larger than RAM. 💾 Your dataset fits comfortably in RAM (or use LazyFrames).
🛡️ You need strict schema validation (Pydantic) and error handling. 🔬 You are doing data exploration or statistical analysis.
🚀 You want to mix Rust I/O performance with Python flexibility. ⚡ Your entire pipeline can be expressed in vectorized expressions.

🚀 Quick Start

Installation

pip install zoopipe

Or using uv:

uv add zoopipe

Or from source (uv recommended):

uv build
uv run maturin develop --release

Simple Example

from pydantic import BaseModel, ConfigDict
from zoopipe import CSVInputAdapter, CSVOutputAdapter, Pipe


class UserSchema(BaseModel):
    model_config = ConfigDict(extra="ignore")
    user_id: str
    username: str
    email: str


pipe = Pipe(
    input_adapter=CSVInputAdapter("users.csv"),
    output_adapter=CSVOutputAdapter("processed_users.csv"),
    error_output_adapter=CSVOutputAdapter("errors.csv"),
    schema_model=UserSchema,
)

pipe.start()
pipe.wait()


print(f"Finished! Processed {pipe.report.total_processed} items.")

Automatically split large files or manage multiple independent workflows:

from zoopipe import PipeManager, MultiProcessEngine

# Create your pipe as usual (Pipe is purely declarative)
pipe = Pipe(...)

# Automatically parallelize across 4 workers
# MultiProcessEngine() for local, RayEngine() for clusters
manager = PipeManager.parallelize_pipe(
    pipe, 
    workers=4, 
    engine=MultiProcessEngine() 
)
manager.start()
manager.wait()

📚 Documentation

Core Concepts

Hooks

Hooks are Python classes that allow you to intercept, transform, and enrich data at different stages of the pipeline.

📘 Read the full Hooks Guide to learn about lifecycle methods (setup, execute, teardown), state management, and advanced patterns like cursor pagination.

Quick Example

from zoopipe import BaseHook

class MyHook(BaseHook):
    def execute(self, entries, store):
        for entry in entries:
            entry["raw_data"]["checked"] = True
        return entries

[!IMPORTANT] If you are using a schema_model, the pipeline will output the contents of validated_data for successful records.

  • To modify data before validation, use pre_validation_hooks and modify entry["raw_data"].
  • To modify data after validation (and ensure it reaches the output), use post_validation_hooks and modify entry["validated_data"].

Executors

Executors control how ZooPipe scales up within a single node using Rust-managed threads. They are the engine under the hood that drives high throughput.

📘 Read the full Executors Guide to understand the difference between SingleThreadExecutor (debug/ordered) and MultiThreadExecutor (high-throughput).

Input/Output Adapters

File Formats

Databases

  • SQL Adapters - Read from and write to SQL databases with batch optimization
  • SQL Pagination - High-performance cursor-style pagination for large tables
  • DuckDB Adapters - Analytical database for OLAP workloads

Messaging Systems

Advanced


🛠 Architecture

ZooPipe is designed as a thin Python wrapper around a powerful Rust core, featuring a two-tier parallel architecture:

  1. Orchestration Tier (Python Engines):
    • Manage distribution across processes or nodes (e.g., MultiProcessEngine).
    • Handles data sharding, process lifecycle, and metrics aggregation.
  2. Execution Tier (Rust BatchExecutors):
    • Internal Throughput: High-speed processing within a single process.
    • Adapters: Native CSV/JSON/SQL Readers and Writers.
    • NativePipe: Orchestrates the loop, fetching chunks and routing result batches.
    • Executors: Multi-threaded Rust strategies to bypass the GIL within a node.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zoopipe-2026.1.22.tar.gz (228.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

zoopipe-2026.1.22-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl (22.4 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ x86-64

zoopipe-2026.1.22-cp310-abi3-win_amd64.whl (17.8 MB view details)

Uploaded CPython 3.10+Windows x86-64

zoopipe-2026.1.22-cp310-abi3-manylinux_2_28_x86_64.whl (22.4 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

zoopipe-2026.1.22-cp310-abi3-manylinux_2_28_aarch64.whl (19.6 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

zoopipe-2026.1.22-cp310-abi3-macosx_11_0_arm64.whl (17.6 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

zoopipe-2026.1.22-cp310-abi3-macosx_10_12_x86_64.whl (19.5 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file zoopipe-2026.1.22.tar.gz.

File metadata

  • Download URL: zoopipe-2026.1.22.tar.gz
  • Upload date:
  • Size: 228.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zoopipe-2026.1.22.tar.gz
Algorithm Hash digest
SHA256 e3a995cfea2b3ea37d2e5fd322a01e9e2bdf4f633294c62c8c839347c2800f6c
MD5 3ddea2c7b92b9d98fd6b01de4aaf5eed
BLAKE2b-256 da2b35574309bb773de86e4d8cb42b036c92a522d6c04efcb0f0ae4a0013bf89

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.22.tar.gz:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.22-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.22-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 825fd6558fdae8e71b9e8092cfe9ac6db98cec6152682677639d1588cbc43d00
MD5 0a7f4c5f68debc05042cd91936b40599
BLAKE2b-256 c8a571cc48180ab547386437c2de45bb74bde95f9464f5e050c73222fa7295ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.22-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.22-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: zoopipe-2026.1.22-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 17.8 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zoopipe-2026.1.22-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 520c899d306bae1879bbb1adbdb3b474e05a2e68220ebe1fc8c1bf514560ad36
MD5 cc7257fdc76c21da08eb5ba216c462b9
BLAKE2b-256 d58315a65206ba45d41c38c723195f87aa3f057040a56a1afceae292b9e37b47

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.22-cp310-abi3-win_amd64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.22-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.22-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 01338c9afeb623288019386bd493cef8608bf455e8b9a2103855045a193ae87c
MD5 adc4c4cd4741f52569bb97d0c309d5a4
BLAKE2b-256 2060dea7a6816fa8f404d34036445977c5b7f924abd7f7ddce8320f803f1d4dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.22-cp310-abi3-manylinux_2_28_x86_64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.22-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.22-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 81032f6b6000b2b8a190cd91661b45a843843b91c5d88cdffc8e10c472c333ab
MD5 6bd58d44203250ee05681e6bef5612b5
BLAKE2b-256 3bf5baf40143e33428773e1a99eb31e11cea314be140fdfa0f303ff58d76821e

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.22-cp310-abi3-manylinux_2_28_aarch64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.22-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.22-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ce9f83e3032c41e48bd233aacc0f2b6d45c857635e3e99c327d2f4e0c1984d80
MD5 5ccb390116eef830ad821c9d589abde8
BLAKE2b-256 2c3bb50994d06cb409470cc623bb634ade4c6704b732950243005492f9f06493

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.22-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zoopipe-2026.1.22-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for zoopipe-2026.1.22-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 51c7c7db79afdfe1590128728f5f6765d026acccd18a0faa26c6a3c2e645a37a
MD5 f4170253dccf13a8dbcbf1adc2714680
BLAKE2b-256 de1d35ff566f432bc196195e0dd7e2471aa9a4690e79a0c1e73f8cf74a37ed2a

See more details on using hashes here.

Provenance

The following attestation bundles were made for zoopipe-2026.1.22-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on albertobadia/zoopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page