Skip to main content

Simple Python for-loop parallelization

Project description

Automatically parallelizing simple python for-loops

This is currently still in a pretty experimental stage, and I can guarantee that it is full of horrendous bugs. There are only two AggregationStrategy types with support for a limited set of objects for now, but the library can definitely be used (especially for loops that don't "return" any values at all). Have a look at the aggregation strategies to see what is currently supported!

How it works

Given a for-loop, e.g.

from paraloop import ParaLoop, Variable
from paraloop import aggregation_strategies

counter = Variable(3, aggregation_strategy=aggregation_strategies.Sum)
dictionary = Variable({}, aggregation_strategy=aggregation_strategies.Concatenate)

for i in ParaLoop(range(0, 100), num_processes=8):
    counter += i
    dictionary[f"key_{i}"] = "Hi!"

print(counter)
print(dictionary)

paraloop will turn it into a function, e.g.

def loop_iterator(i):
    counter.assign(counter + 1)
    dictionary[f"key_{i}"] = "Hi"!

And will call the function once for every iteration of the loop across multiple processes, instead of the original loop body. Once the processes have finished, paraloop will handle the aggregation based on the chosen AggregationStrategy, so that you can access your variable as if no multiprocessing ever happened.

When would I use this?

paraloop is intended to be used for parallelizing for-loops that take an annoying amount of time, but are not worth spending the time and effort of proper multiprocessing on. These are usually fairly simple loops in research-style code that involve many web or file operations, but the goal of paraloop is to support parallelizing any Python for-loop by simply wrapping the variables and calling ParaLoop, without other modifications to the source code.

paraloop is not intended to be optimally efficient or provide a robust multiprocessing framework, and you probably shouldn't want to use this in a production environment. If you're looking for a robust multiprocessing framework that does require a bit of setup (i.e. rewriting your loop to a function with some specific return value and then aggregating those values yourself), have a look at joblib.

Practical example

Have a look at example.py. It queries some WikiPedia pages and counts the frequency of each word. The output of the script is as follows:

13882 57495
The original loop took 18.553406238555908 seconds.
13882 paraloop.Variable(57495)
The ParaLoop took 1.689873456954956 seconds.

Which is of course because most of the time is spent waiting for the WikiPedia server to respond.

Roadmap

  • Write unit tests for the ParaLoop class and the loop transformer
  • Automatically determine the optimal number of processes if none was specified
  • Add an optional progress bar
  • Add a timeout in case a worker silently fails
  • Add SharedVariables that are stored in shared memory and hence don't need to be aggregated at all

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paraloop-0.0.1.dev1.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

paraloop-0.0.1.dev1-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file paraloop-0.0.1.dev1.tar.gz.

File metadata

  • Download URL: paraloop-0.0.1.dev1.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.3

File hashes

Hashes for paraloop-0.0.1.dev1.tar.gz
Algorithm Hash digest
SHA256 afb7c50f486c74f661651006545f6609796787c615acfc5a31e50d40d9ec4658
MD5 a5f7d31d1f874ce1126723865857406b
BLAKE2b-256 9e2a99b890293d5cb548242026dfca09f9efcfb890c2b2e471cbb9556b68ae95

See more details on using hashes here.

File details

Details for the file paraloop-0.0.1.dev1-py3-none-any.whl.

File metadata

  • Download URL: paraloop-0.0.1.dev1-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.3

File hashes

Hashes for paraloop-0.0.1.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 98f9079e584d2fc036da3e8c5298586cf1079052c82ff48f64050eb4dde88df3
MD5 ea379fdb1c8d04b28e68a67af3d7c181
BLAKE2b-256 86c84c1a78dc79a2cd21feb526fbbbf916ae90f276c36736a3e1ddad02a4632e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page