Skip to main content

Tools to parallelize operations on large data sets using shared memory with zero copies.

Project description

pasha

pasha (parallelized shared memory) provides tools to process data in parallel with an emphasis on shared memory and zero copy. It uses the map pattern similar to Python's builtin map() function, where a callable is applied to many elements in a collection. To avoid the high cost of IPC or other communication schemes, the results are meant to be written directly to memory shared between all workers as well as the calling site. The current implementations cover distribution across threads and processes on a single node.

Quick guide

To use it, simply import it, define your kernel function of choice and map away!

import numpy as np
import pasha as psh

# Get some random input data
inp = np.random.rand(100)

# Allocate output array via pasha. The returned array is
# guaranteed to be accessible from any worker, and may
# reside in shared memory.
outp = psh.alloc(like=inp)

# Define a kernel function multiplying each value with 3.
def triple_it(worker_id, index, value):
    outp[index] = 3 * value

# Map the kernel function.
psh.map(triple_it, inp)

# Check the result
np.testing.assert_allclose(outp, inp*3)

The runtime environment is controlled by a map context. The default context object is ProcessContext, which uses multiprocessing.Pool to distribute the work across several processes. This context only works on *nix systems supporting the fork() system call, as it expects any input data to be shared. When the process context is selected, psh.alloc() creates arrays in shared memory, so workers can write output data there and the caller can retrieve it with no memory copies.

You may either create an explicit context object and use it directly or change the default context, e.g.

psh.set_default_context('threads', num_workers=4)

There are three different context types builtin: serial, threads and processes.

The input array passed to map() is called a functor and is automatically wrapped in a suitable Functor object, here SequenceFunctor. This works for a number of common array and sequence types, but you may also implement your own Functor object to wrap anything else that can be iterated over.

For example, this is used to provide tight integration with EXtra-data, a toolkit used to access scientific data recorded at European XFEL. With this, you can map over DataCollection and KeyData objects to parallelize your data analysis.

def analysis_kernel(worker_id, index, train_id, data):
    # Do something with the data and save it to shared memory.

run = extra_data.open_run(proposal=700000, run=1)
psh.map(analysis_kernel, run[source, key])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pasha-0.1.1.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

pasha-0.1.1-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file pasha-0.1.1.tar.gz.

File metadata

  • Download URL: pasha-0.1.1.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/29.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.36.1 importlib-metadata/3.7.0 keyring/18.0.0 rfc3986/1.5.0 colorama/0.4.4 CPython/3.7.4

File hashes

Hashes for pasha-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a032687bb11a9609057602daa93ff59878153cc67473ed45923634cd35332592
MD5 20cb21b6075022228712af1ac9646fdd
BLAKE2b-256 f35c1123132a1da2e6bfb53a1831752c652b9b3f1749b5a7a9ae5b9b2c04bedf

See more details on using hashes here.

File details

Details for the file pasha-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pasha-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/29.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.36.1 importlib-metadata/3.7.0 keyring/18.0.0 rfc3986/1.5.0 colorama/0.4.4 CPython/3.7.4

File hashes

Hashes for pasha-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 aac264ea1902b0dbf135fa9a8625bbc4707b032c79b5373e69452b0bf82b1410
MD5 446aa8d9f2ff0a2c401f9c940bc9939b
BLAKE2b-256 c013cd0a2efac62d92f892f91f0e3fd8a5a11be5430c20de51c210d536108ee5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page