Skip to main content

Parallelize pipelines of Python async iterables/generators

Project description

asyncio-buffered-pipeline CircleCI Test Coverage

Parallelise pipelines of Python async iterables/generators.

Installation

pip install asyncio-buffered-pipeline

Usage / What problem does this solve?

If you have a chain of async generators, even though each is async, only one runs at any given time. For example, the below runs in (just over) 30 seconds.

import asyncio

async def gen_1():
    for value in range(0, 10):
        await asyncio.sleep(1)  # Could be a slow HTTP request
        yield value

async def gen_2(it):
    async for value in it:
        await asyncio.sleep(1)  # Could be a slow HTTP request
        yield value * 2

async def gen_3(it):
    async for value in it:
        await asyncio.sleep(1)  # Could be a slow HTTP request
        yield value + 3

async def main():
    it_1 = gen_1()
    it_2 = gen_2(it_1)
    it_3 = gen_3(it_2)

    async for val in it_3:
        print(val)

asyncio.run(main())

The buffered_pipeline function allows you to make to a small change, passing each generator through its return value, to parallelise the generators to reduce this to (just over) 12 seconds.

import asyncio
from asyncio_buffered_pipeline import buffered_pipeline

async def gen_1():
    for value in range(0, 10):
        await asyncio.sleep(1)  # Could be a slow HTTP request
        yield value

async def gen_2(it):
    async for value in it:
        await asyncio.sleep(1)  # Could be a slow HTTP request
        yield value * 2

async def gen_3(it):
    async for value in it:
        await asyncio.sleep(1)  # Could be a slow HTTP request
        yield value + 3

async def main():
    buffer_iterable = buffered_pipeline()
    it_1 = buffer_iterable(gen_1())
    it_2 = buffer_iterable(gen_2(it_1))
    it_3 = buffer_iterable(gen_3(it_2))

    async for val in it_3:
        print(val)

asyncio.run(main())

The buffered_pipeline ensures internal tasks are cancelled on any exception.

Buffer size

The default buffer size is 1. This is suitable if each iteration takes approximately the same amount of time. If this is not the case, you may wish to change it using the buffer_size parameter of buffer_iterable.

it = buffer_iterable(gen(), buffer_size=2)

Features

  • Only one task is created for each buffer_iterable, in which the iterable is iterated over, with its values stored in an internal buffer.

  • All the tasks of the pipeline are cancelled if any of the generators raise an exception.

  • If a generator raises an exception, the exception is propagated to calling code.

  • The buffer size of each step in the pipeline is configurable.

  • The "chaining" is not abstracted away. You still have full control over the arguments passed to each step, and you don't need to buffer each iterable in the pipeline if you don't want to: just don't pass those through buffer_iterable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asyncio-buffered-pipeline-0.0.6.tar.gz (3.3 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file asyncio-buffered-pipeline-0.0.6.tar.gz.

File metadata

  • Download URL: asyncio-buffered-pipeline-0.0.6.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for asyncio-buffered-pipeline-0.0.6.tar.gz
Algorithm Hash digest
SHA256 da4d6542c0babc08fa68212bc960fcd6983735d9a3d8a404240062330a39daad
MD5 f269dc93d2353b1413628a03163a4a17
BLAKE2b-256 19aa249dec01117faf7047d6d0b8c9f8fee4f20b8457af60981b1f9472d0c76d

See more details on using hashes here.

File details

Details for the file asyncio_buffered_pipeline-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: asyncio_buffered_pipeline-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for asyncio_buffered_pipeline-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 377ce57c7f5b3afd1558e45c09ec63e5de14be91542147f4b39b6a84b8421907
MD5 d165f9ef95a5c04b787d18fd7fbd546e
BLAKE2b-256 c18483b67efcbfdda8c74c2dd2e38f1c1fcd1a70313209be7fb71843095e77a8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page