Use all your cores with no extra code
Project description
easy_multiprocess
easy_multiprocess
is a package that makes it extremely simple to multiprocess.
Multiprocess your code with just 1 line!
# Before:
def func1(x):
# some heavy computing
...
a = [func1(i) for i in range(16)]
# After:
@parallelize
def func1(x):
# some heavy computing
a = [func1(i) for i in range(16)] # all calls run in parallel on 16 cores
Other multiprocess libraries force a specific coding style/syntax.
Below is the same code from above, except using concurrent.futures
:
def func1(x):
# some heavy computation
...
# concurrent.futures
with ProcessPoolExecutor() as pool:
a = list(pool.map(func1, range(16)))
Other multiprocess libraries don't use all cores (when multiple operations occur).
On a 16 core machine, let's see how long the following two take:
# Our machine has 16 cores
# func1, func2... each take 10 seconds
# 1: concurrent.futures library:
a = list(pool.map(func1, range(4)))
b = list(pool.map(func2, range(4)))
c = list(pool.map(func3, range(4)))
d = list(pool.map(func4, range(4)))
# elapsed time = 40s
# 2: easy_multiprocess:
a = [func1(i) for i in range(4)]
b = [func2(i) for i in range(4)]
c = [func3(i) for i in range(4)]
d = [func4(i) for i in range(4)]
# elapsed time = 10s
Parallelize simple code
You can even use easy_multiprocess
for simple code (that needs parallelizing):
# func1, func2... each take 10 seconds
a = func1(0)
b = func2(1)
c = func3(2)
d = func4(3)
print(a, b, c, d)
# elapsed time = 10s
Non Embarrasingly Parallel Code
It even works for the non-embarrassingly parallel case (but might be suboptimal):
# func1, func2... each take 10 seconds
a = func1(0)
b = func2(a)
c = func3(2) # c/d need to wait if after b
d = func4(3)
print(a, b, c, d)
# elapsed time = 30s
easy_multiprocess
implicitly uses a DAG computation graph for this (other libraries have similar mechanisms, such as Ray's DAG). See Limitations for where this doesn't work.
User Installation
On Mac/Linux/Unix-like:
pip install easy_multiprocess
(Windows not currently supported)
Developer Installation
git clone <this_repo>
cd easymultiprocess
pip install -e .
Then, run tests:
python -m unittest tests.test
Author Notes
I built easy_multiprocess
simply to learn how to build a python package.
It's built on top of concurrent.futures
, rather than being built from the ground up using OS-level primitives, since that would've taken me over 10x as much time and code to build. This means it has MANY limitations.
Limitations:
is
comparisons aren't supported forFutureResult
objects due to python identity. This means that foris
operations involving any output from any@parallelize
-d function, the user should use==
instead, or call.result()
before usingis
(similar to anyfuture
object from other multiprocess libraries). Adding support for this would require the user to install/use an inefficient custom python interpreter fork, which I would also have to spend time to build. This would anyway defeat the purpose of user-friendliness.- Standard IO streams are not guaranteed to work correctly
- The non-embarrassingly parallel case is suboptimally implemented (see the example, which should take 20s in the ideal case), but can be improved in the future
- Requires Copy-on-write, so only works on Mac/Linux/Unix-like (system with fork method)
General limitations of all common python multiprocessing libraries:
- Closure variables cannot be created/updated once processes are set up (for std library concurrent futures, this occurs upon first submission to executor). You can get around this by calling
ProcessPoolManager.cleanup
andget_executor
again. (TODO: add code sample) - Args must be
pickle
-able (some other cases also work, such as if the library is usingdill
or other serialization methods) - If you have lots of state, can be expensive to create new processes (copy-on-write not guaranteed)
- Program correctness is not guaranteed when external state race-conditions exist (ex. parallel processes try to write/read from same file)
Other Notes:
- The
@parallelize
decorator will send off the code it wraps to another process parallelize
sounds more intuitive (and cooler), butconcurrent
is technically "correct". If you want, you can use@concurrent
instead
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for easy_multiprocess-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a27cfd31463155995e7fe48c449c468963dfe9ca61c22a05c4917893d4ad3c07 |
|
MD5 | 8d96fdfd33d87a38fbb4bb4fdd6298ed |
|
BLAKE2b-256 | 353e9d206d6c6af4db232c10bde322ebe3667c9e7a5a0e5d97e4fd53f1816383 |