A Python multithreading library for data processing pipelines, data streaming, etc.
Project description
multiflow
About
multiflow
is a Python multithreading library for data processing pipelines/workflows, streaming, etc. It extends concurrent.futures
by allowing the input and output to be generator objects. And, it makes it easy to string together multiple thread pools together to create a multithreaded pipeline.
Additionally, multiflow
comes with periodic logging, automatic retries, error handling, and argument expansion.
Why?
The ability to accept an input generator object while yielding an output generator object makes it ideal for concurrently doing multiple jobs where the output of the first job is the input of the second job. This means that it can start doing work on the second job before the first job completes; thus, completing the total work faster.
A great use case for this is streaming data. For example, with multiflow
and smart_open
, you could stream images from S3 and process them in a multithreaded environment before exporting them elsewhere.
Install
pip install multiflow
Quickstart
from multiflow import MultithreadedFlow
image_paths = [] # list of images
def transform(image_path):
# do some work
return new_path
with MultithreadedFlow() as flow:
flow.consume(image_paths) # can accept generator object or iterable item (see examples below for generator)
flow.add_function(transform)
for output in flow:
if output: # if successful
print(output) # new_path
else:
e = output.get_exception()
success = flow.get_successful_job_count()
failed = flow.get_failed_job_count()
Examples
For a working program using multiflow
, see this example which resizes a S3 bucket of images to 50% and saves the resized images locally.
Documentation
The documentation is still a work in progress, but for the most up to date documentation, please see this page.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file multiflow-1.0.4.tar.gz
.
File metadata
- Download URL: multiflow-1.0.4.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35d9ec0b900c8730dac40895a6da4b50846cd50feee19f222da6f83f1346bf3c |
|
MD5 | 00226f3c48f71c171811039a84ee7a5f |
|
BLAKE2b-256 | d4ee165792df81adcf6f3d765d52b7c7a00972e621835bc21ab6b79d238a36b4 |
File details
Details for the file multiflow-1.0.4-py3-none-any.whl
.
File metadata
- Download URL: multiflow-1.0.4-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b173539e22407fa9b561c37cbc4b33f2d048d0d180affb47812bb2006312d4b9 |
|
MD5 | 039422419e7743129bd8ae9049e56fc1 |
|
BLAKE2b-256 | 488e45717b574ab4e09881213fe9fd9eed3544889829532b98fc4d3679e804e6 |