Skip to main content

Async distributed process pool using asyncio

Project description

Build PyPi Documentation Status

Introduction

The distex package provides a distributed process pool that uses asyncio to efficiently manage the local and remote worker processes.

Features:

  • Scales from 1 to 1000’s of processors;

  • Can handle in the order of 50.000 small tasks per second;

  • Easy to use with ssh (secure shell);

  • Full asynchronous support;

  • Maps over unbounded iterables;

  • Choice of pickle, dill or cloudpickle serialization for functions and data;

  • Backward compatible with concurrent.futures.ProcessPool (PEP3148).

Installation

pip3 install -U distex

Dependencies:

  • Python version 3.6 or higher;

  • On Unix the uvloop package is recommended: pip3 install uvloop

  • SSH client and server (optional).

Examples

A process pool can have local and remote workers in any combination. Here is a pool that uses 4 local workers:

from distex import Pool

def f(x):
    return x*x

pool = Pool(4)
for y in pool.map(f, range(100)):
    print(y)

To create a pool that also uses 8 workers on host maxi, using ssh:

pool = Pool(4, ['ssh://maxi/8'])

There is full support for every asynchronous construct imaginable:

import asyncio
from distex import Pool

def init():
    # pool initializer: set the start time for every worker
    import time
    __builtins__.t0 = time.time()

async def timer(i=0):
    # async code running in the pool
    import time
    await asyncio.sleep(1)
    return time.time() - t0

async def ait():
    # async iterator running on the user side
    for i in range(20):
        await asyncio.sleep(0.1)
        yield i

async def main():
    async with Pool(4, initializer=init, qsize=1) as pool:
        async for t in pool.map_async(timer, ait()):
            print(t)
        print(await pool.run_on_all_async(timer))


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

High level architecture

Distex does not use remote ‘task servers’. Instead it is done the other way around: A local server is started first; Then the local and remote workers are started and each of them will connect on its own back to the server. When all workers have connected then the pool is ready for duty.

Each worker consists of a single-threaded process that is running an asyncio event loop. This loop is used both for communication and for running asynchronous tasks. Synchronous tasks are run in a blocking fashion.

When using ssh, a remote (or ‘reverse’) tunnel is created from a remote Unix socket to the local Unix socket that the local server is listening on. Multiple workers on a remote machine will use the same Unix socket and share the same ssh tunnel.

Documentation

Distex documentation

Changelog

Version 0.5.9

  • Make pool.shutdown() possible when event loop is already running

Version 0.5.8

  • PR #9 merged to fix server script

Version 0.5.7

  • distex_proc entry point is now used to allow various Python setups

Version 0.5.6

  • Fixed issue #5

Version 0.5.5

  • Optimizations; some logging issues fixed.

Version 0.5.4

  • Fixed issue #4

Version 0.5.3

  • Small scheduling improvements

Version 0.5.2

  • Optimizations for large data

  • Better error handling when result can’t be pickled

Version 0.5.1

  • Fixes for Windows

Version 0.5.0

  • Initial release

author:

Ewald de Wit <ewald.de.wit@gmail.com>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distex-0.5.9.linux-x86_64.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

distex-0.5.9-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file distex-0.5.9.linux-x86_64.tar.gz.

File metadata

  • Download URL: distex-0.5.9.linux-x86_64.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.23.3 CPython/3.6.6

File hashes

Hashes for distex-0.5.9.linux-x86_64.tar.gz
Algorithm Hash digest
SHA256 ed082c0276a311b62eb324a467bcd54b63fc51507859a166fd1757482ba94d85
MD5 799005442b1b292f55e187c3b8e82286
BLAKE2b-256 93ea18b9ce48a0ef82af619c445aede405e9df5231ac491768581306e8c2c56f

See more details on using hashes here.

File details

Details for the file distex-0.5.9-py3-none-any.whl.

File metadata

  • Download URL: distex-0.5.9-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.23.3 CPython/3.6.6

File hashes

Hashes for distex-0.5.9-py3-none-any.whl
Algorithm Hash digest
SHA256 1fe3e414a76d05aab4b7d9055c0c0d16a5b9899838ff7daa267d588826b91412
MD5 fae474cb9ecec00a4c3eb41403d88b37
BLAKE2b-256 eb3f5fd2f742af1f86db6e5641f1a84073ca34b4f0d6cd71e41d4907bce932bf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page