Skip to main content

A Python multithreading library for data processing pipelines, data streaming, etc.

Project description

multiflow

tests codecov Gitpod ready PyPI version shields.io PyPI license

About

multiflow is a Python multithreading library for data processing pipelines/workflows, streaming, etc. It extends concurrent.futures by allowing the input and output to be generator objects. And, it makes it easy to string together multiple thread pools together to create a multithreaded pipeline.

Additionally, multiflow comes with periodic logging, automatic retries, error handling, and argument expansion.

Why?

The ability to accept an input generator object while yielding an output generator object makes it ideal for concurrently doing multiple jobs where the output of the first job is the input of the second job. This means that it can start doing work on the second job before the first job completes; thus, completing the total work faster.

A great use case for this is streaming data. For example, with multiflow and smart_open, you could stream images from S3 and process them in a multithreaded environment before exporting them elsewhere.

Install

pip install multiflow

Quickstart

from multiflow import MultithreadedFlow


image_paths = []  # list of images


def transform(image_path):
    # do some work
    return new_path


with MultithreadedFlow() as flow:
    flow.consume(image_paths)  # can accept generator object or iterable item (see examples below for generator)
    flow.add_function(transform)

    for output in flow:
        if output:  # if successful
            print(output)  # new_path
        else:
            e = output.get_exception()

    success = flow.get_successful_job_count()
    failed = flow.get_failed_job_count()

Examples

For a working program using multiflow, see this example which resizes a S3 bucket of images to 50% and saves the resized images locally.

Documentation

The documentation is still a work in progress, but for the most up to date documentation, please see this page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multiflow-1.0.4.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

multiflow-1.0.4-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file multiflow-1.0.4.tar.gz.

File metadata

  • Download URL: multiflow-1.0.4.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for multiflow-1.0.4.tar.gz
Algorithm Hash digest
SHA256 35d9ec0b900c8730dac40895a6da4b50846cd50feee19f222da6f83f1346bf3c
MD5 00226f3c48f71c171811039a84ee7a5f
BLAKE2b-256 d4ee165792df81adcf6f3d765d52b7c7a00972e621835bc21ab6b79d238a36b4

See more details on using hashes here.

File details

Details for the file multiflow-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: multiflow-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for multiflow-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b173539e22407fa9b561c37cbc4b33f2d048d0d180affb47812bb2006312d4b9
MD5 039422419e7743129bd8ae9049e56fc1
BLAKE2b-256 488e45717b574ab4e09881213fe9fd9eed3544889829532b98fc4d3679e804e6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page