Skip to main content

Use all your cores with no extra code

Project description

easy_multiprocess

easy_multiprocess is a package that makes it extremely simple to multiprocess.


Multiprocess your code with just 1 line!

# Before:
def func1(x):
	# some heavy computing
    ...

a = [func1(i) for i in range(16)]

# After:
@parallelize
def func1(x):
	# some heavy computing

a = [func1(i) for i in range(16)] # all calls run in parallel on 16 cores

Other multiprocess libraries force a specific coding style/syntax.

Below is the same code from above, except using concurrent.futures:

def func1(x):
	# some heavy computation
    ...

# concurrent.futures
with ProcessPoolExecutor() as pool:
	a = list(pool.map(func1, range(16)))

Other multiprocess libraries don't use all cores (when multiple operations occur).

On a 16 core machine, let's see how long the following two take:

# Our machine has 16 cores
# func1, func2... each take 10 seconds

# 1: concurrent.futures library:
a = list(pool.map(func1, range(4)))
b = list(pool.map(func2, range(4)))
c = list(pool.map(func3, range(4)))
d = list(pool.map(func4, range(4)))

# elapsed time = 40s


# 2: easy_multiprocess:
a = [func1(i) for i in range(4)]
b = [func2(i) for i in range(4)]
c = [func3(i) for i in range(4)]
d = [func4(i) for i in range(4)]

# elapsed time = 10s

Parallelize simple code

You can even use easy_multiprocess for simple code (that needs parallelizing):

# func1, func2... each take 10 seconds
a = func1(0)
b = func2(1)
c = func3(2)
d = func4(3)
print(a, b, c, d)
# elapsed time = 10s

Non Embarrasingly Parallel Code

It even works for the non-embarrassingly parallel case (but might be suboptimal):

# func1, func2... each take 10 seconds
a = func1(0)
b = func2(a)
c = func3(2) # c/d need to wait if after b
d = func4(3)
print(a, b, c, d)

# elapsed time = 30s

easy_multiprocess implicitly uses a DAG computation graph for this (other libraries have similar mechanisms, such as Ray's DAG). See Limitations for where this doesn't work.


User Installation

On Mac/Linux/Unix-like:

pip install easy_multiprocess

(Windows not currently supported)


Developer Installation

git clone <this_repo>
cd easymultiprocess
pip install -e .

Then, run tests:

python -m unittest tests.test

Author Notes

I built easy_multiprocess simply to learn how to build a python package.

It's built on top of concurrent.futures, rather than being built from the ground up using OS-level primitives, since that would've taken me over 10x as much time and code to build. This means it has MANY limitations.

Limitations:

  • is comparisons aren't supported for FutureResult objects due to python identity. This means that for is operations involving any output from any @parallelize-d function, the user should use == instead, or call .result() before using is (similar to any future object from other multiprocess libraries). Adding support for this would require the user to install/use an inefficient custom python interpreter fork, which I would also have to spend time to build. This would anyway defeat the purpose of user-friendliness.
  • Standard IO streams are not guaranteed to work correctly
  • The non-embarrassingly parallel case is suboptimally implemented (see the example, which should take 20s in the ideal case), but can be improved in the future
  • Requires Copy-on-write, so only works on Mac/Linux/Unix-like (system with fork method)

General limitations of all common python multiprocessing libraries:

  • Closure variables cannot be created/updated once processes are set up (for std library concurrent futures, this occurs upon first submission to executor). You can get around this by calling ProcessPoolManager.cleanup and get_executor again. (TODO: add code sample)
  • Args must be pickle-able (some other cases also work, such as if the library is using dill or other serialization methods)
  • If you have lots of state, can be expensive to create new processes (copy-on-write not guaranteed)
  • Program correctness is not guaranteed when external state race-conditions exist (ex. parallel processes try to write/read from same file)

Other Notes:

  • The @parallelize decorator will send off the code it wraps to another process
  • parallelize sounds more intuitive (and cooler), but concurrent is technically "correct". If you want, you can use @concurrent instead

Future improvements:

  • Avoid pickle-ing arguments. Instead, wrap the function and turn its args into closure variables (copy-on-write would apply). Even if you pass in a large arg (such as a large ML model), it would not delay, or need to copy the large arg over to, the subprocess.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_multiprocess-1.0.1.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

easy_multiprocess-1.0.1-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file easy_multiprocess-1.0.1.tar.gz.

File metadata

  • Download URL: easy_multiprocess-1.0.1.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for easy_multiprocess-1.0.1.tar.gz
Algorithm Hash digest
SHA256 91b8e8799cb740be87ad6c765a12a96cf62c0ea5ba374f3c582ab5f1793c96f8
MD5 98f26f82e026a60dc356bea6cf1bda37
BLAKE2b-256 3527b5961061103a6ff4f7d6b18fc3a72f6fb82d61d14a6deb1dd5779f142c28

See more details on using hashes here.

File details

Details for the file easy_multiprocess-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for easy_multiprocess-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e425ee8240e327f2c37a7f6011070b84e00deb71c619dedb1207926d935ec4bc
MD5 8fdd658d7695e2b93cd45bff4d76ac2c
BLAKE2b-256 db7155c882302a60ddc77ca50327f8f05103aa3ee1fb3472280a9aa884754c6c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page