Skip to main content

A generic library for concurrent/parallel task execution with idempotent caching.

Project description

Pararun

Pararun is a lightweight, fault-tolerant Python library for concurrent and parallel task execution. It simplifies running tasks using asyncio, multiprocessing, or threading, with built-in support for persistent caching (idempotency), progress bars, and streaming large datasets.

Features

  • 🚀 Unified API: Simple pr.map for parallel processing and pr.aio_map for async tasks.
  • 💾 Idempotent Caching: Automatically skips processed items by checking a JSONL cache file. Perfect for resumable long-running jobs.
  • 🌊 Streaming Support: Handles large datasets (generators) with controlled memory usage using backpressure.
  • 📊 Progress Monitoring: Integrated tqdm progress bars.
  • 🛡️ Fault Tolerance: Safely handles crashes by flushing results to disk periodically.

Installation

pip install pararun

Quick Start

1. Parallel Processing (CPU/IO Bound)

Use pr.map for blocking functions. It uses concurrent.futures implementation.

import pararun as pr
import time

def process_file(filename):
    time.sleep(0.1)  # Simulate blocking work
    return {"id": filename, "status": "done"}

# Works with Lists or Generators
files = (f"data_{i}.txt" for i in range(100))

# Result is saved to 'results.jsonl' automatically
pr.map(
    func=process_file,
    iterable=files,
    n_workers=4,
    cache_path="results.jsonl"
)

2. Async Processing (AsyncIO)

Use pr.aio_map for native async functions.

import pararun as pr
import asyncio

async def fetch_url(item):
    await asyncio.sleep(0.1) # Simulate network request
    return {"id": item["url"], "status": 200}

async def main():
    urls = [{"url": f"https://example.com/{i}"} for i in range(100)]
    
    await pr.aio_map(
        func=fetch_url,
        iterable=urls,
        n_workers=10,
        cache_path="async_results.jsonl"
    )

if __name__ == "__main__":
    asyncio.run(main())

Advanced Usage

Idempotency & Resuming

When cache_path is provided, pararun reads the file (if it exists) to verify which items have already been processed.

By default, it assumes the output items contain an "id" field. You can customize this field using the key_field parameter:

pr.map(..., cache_path="cache.jsonl", key_field="filename")
  • Run 1: Process 50% of items -> Crash.
  • Run 2: Point to same cache_path. pararun skips the first 50% and resumes from where it left off.

Streaming Large Datasets

pararun is designed to be memory efficient. It uses bounded queues (semaphores) to ensure that even if you pass a generator with 100M items, only n_workers * 2 items are held in memory at any time.

Development

Install dependencies and run tests:

# Install package in editable mode
pip install -e .

# Run tests
python -m pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pararun-0.1.0.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pararun-0.1.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file pararun-0.1.0.tar.gz.

File metadata

  • Download URL: pararun-0.1.0.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for pararun-0.1.0.tar.gz
Algorithm Hash digest
SHA256 636e721acb7b7927d08e60fc58ac86b4ec55bedab3f2cb943940bf6a155cacb0
MD5 6aae6c3b19ea6d8403987c4774c80696
BLAKE2b-256 c20fa8e8b4b105573b73ec24fbc011f4256a57d1e0c10556107d78887dcc7b0d

See more details on using hashes here.

File details

Details for the file pararun-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pararun-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for pararun-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7f0227eadfeba080edadeaff2fa27a99a4582ed72c191b896a94b4136272167e
MD5 b95555a078b53073e32032c90f0c9b7a
BLAKE2b-256 80dd82af70c578bef471d5219b01bedc037f343c5332918f891bc8aa6f89da25

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page