Async distributed process pool using asyncio
Project description
Introduction
The distex package provides a distributed process pool that uses asyncio to efficiently manage the local and remote worker processes.
Features:
Scales to 1000’s of processors;
Can handle in the order of 50.000 small tasks per second;
Easy to use with ssh (secure shell);
Full asynchronous support;
Maps over unbounded iterables;
Choice of pickle, dill or cloudpickle serialization for functions and data;
Backward compatible with concurrent.futures.ProcessPool (PEP3148).
Installation
pip3 install -U distex
Dependencies:
Python version 3.6 or higher;
On Unix the uvloop package is recommended: pip3 install uvloop
SSH client and server (optional).
Examples
A process pool can have local and remote workers in any combination. Here is a pool that uses 4 local workers:
from distex import Pool
def f(x):
return x*x
pool = Pool(4)
for y in pool.map(f, range(100)):
print(y)
To create a pool that also uses 8 workers on host maxi, using ssh:
pool = Pool(4, ['ssh://maxi/8'])
There is full support for every asynchronous construct imaginable:
import asyncio
from distex import Pool
def init():
# pool initializer: set the start time for every worker
import time
__builtins__.t0 = time.time()
async def timer(i=0):
# async code running in the pool
import time
await asyncio.sleep(1)
return time.time() - t0
async def ait():
# async iterator running on the user side
for i in range(20):
await asyncio.sleep(0.1)
yield i
async def main():
async with Pool(4, initializer=init, qsize=1) as pool:
async for t in pool.map_async(timer, ait()):
print(t)
print(await pool.run_on_all_async(timer))
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
High level architecture
Distex does not use remote ‘task servers’. Instead it is done the other way around: A local server is started first; Then the local and remote workers are started and each of them will connect on its own back to the server. When all workers have connected then the pool is ready for duty.
Each worker consists of a single-threaded process that is running an asyncio event loop. This loop is used both for communication and for running asynchronous tasks. Synchronous tasks are run in a blocking fashion.
When using ssh, a remote (or ‘reverse’) tunnel is created from a remote Unix socket to the local Unix socket that the local server is listening on. Multiple workers on a remote machine will use the same Unix socket and share the same ssh tunnel.
Documentation
Changelog
Version 0.5.5
Optimizations; some logging issues fixed.
Version 0.5.4
Fixed issue #4
Version 0.5.3
Small scheduling improvements
Version 0.5.2
Optimizations for large data
Better error handling when result can’t be pickled
Version 0.5.1
Fixes for Windows
Version 0.5.0
Initial release
- author:
Ewald de Wit <ewald.de.wit@gmail.com>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file distex-0.5.5-py3-none-any.whl
.
File metadata
- Download URL: distex-0.5.5-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8eeed7218871073ffc71911b9e6554e9e0305d7eadf9bb1feb7966325ba3681e |
|
MD5 | dc1d7f922f9176b6fb5c5e08307c9002 |
|
BLAKE2b-256 | d20b6d8b182c594e3534f346d232ea7e06dfa0bd48a38595da03208347e39a1d |