Skip to main content

Easy parallel processing in Python

Project description

https://img.shields.io/pypi/v/paraproc.svg https://img.shields.io/badge/license-MIT-green.svg

Overview

Paraproc is a simple library that helps you easily parallelize your computation (over independent chunks of data) across multiple processes in Python, especially when you want to mix callings to external command line programs and hand brew Python functions together in your data processing pipeline.

Under the hood, it combines subprocess and multiprocessing, and uses a process pool to schedule the jobs. It also provides a numpy.ndarray interface to access shared-memory across multiple processes.

Paraproc supports both Python 2 and 3, with numpy as the only external dependency. It is contained in only one Python file, so it can be easily copied into your project. (The copyright and license notice must be retained.)

Code snippets that demonstrate the basic usage of the library can be found later in this documentation, and in the demo_*.py files.

Bugs can be reported to https://github.com/herrlich10/paraproc. The code can also be found there.

Quick starts

Execute commands in parallel

You can run both Python codes and command line programs in parallel:

import os
import paraproc
def my_job():
    print(os.getpid())

pc = paraproc.PooledCaller()
for k in range(5):
    pc.check_call(my_job)
for k in range(5):
    pc.check_call('echo $$', shell=True) # For linux/mac
pc.wait()

The pc.check_call() method will return immediatedly. The actual execution of the queued commands are delayed until you call pc.wait().

Use shared-memory

You can load large data in shared-memory, and read or write them as a normal numpy array from multiple processes:

import numpy as np
import paraproc
def slow_operation(k, x):
    x.acquire()
    x[:100000,:] += 1 # Write access
    res = np.mean(x) # Read access
    x.release()
    print('#{0}: mean = {1}'.format(k, res))

a = paraproc.SharedMemoryArray.from_array(np.random.rand(1000000,500)) # About 4 GB
pc = paraproc.PooledCaller()
for k in range(pc.pool_size):
    pc.check_call(slow_operation, k, a)
pc.wait()

The data in a is shared in memory across all children processes and never copied even with write accesses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paraproc-0.1.3.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

paraproc-0.1.3-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file paraproc-0.1.3.tar.gz.

File metadata

  • Download URL: paraproc-0.1.3.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for paraproc-0.1.3.tar.gz
Algorithm Hash digest
SHA256 0687cc1cf9a14dc58229c51941b51c0bb494193b493b451e59dcd5e6e33d8650
MD5 33b961d2be1bf1314d9f54543d0028a0
BLAKE2b-256 a495d95b1cf999c7493fc241df974864ef5c11f3f67de4f5e20274d4a2d19b86

See more details on using hashes here.

File details

Details for the file paraproc-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for paraproc-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0537b804047a72eb6ac30f254a67bd78151d6c93b8428bc427aca9cb3833f49c
MD5 dc66035a482d86aacf8fd9b666b3d1f5
BLAKE2b-256 5e31c6b9e7f406ed04457f6591e0f5ace88de2d16c52b536633a465fbac13582

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page