Skip to main content

Manage pool of array using shared memory

Project description

Pyarraypool

Licence Python build Rust build PyPI PyPI - Implementation PyPI - Python Version

Transfer numpy array between processes using shared memory.

Why creating this project ?

This library aims to speed up parallel data processing with CPython and numpy NDArray.

What is the issue with regular python multitasking ?

Python GIL does not permit to use multithreading for parallel data processing. It is indeed release when C code / Cython (nogil) / IO tasks are done but it is still lock for computation tasks. This is why subprocess are often used to have multiple processing task done.

Alternative to subprocess worker exists but they are not always possible to use. To list few of them:

  • numba with prange
  • switching from CPython to PyPy
  • rewrite code using C / Cython / Rust

Why not using numpy builtin mmap ?

Numpy builtin memory mapping is made to manage a single numpy array. It is not made to manage multiple "small" array that are frequently created / destroy.

Few design choices

Python standard library already contains a module to create and manage shared memory.

However it does not permit to manage it as a RAW bloc safely and easily. So performances drop because several system call must be done on each bloc creation / deletion.

In this library:

  • shared memory is manage as a "pool" using Rust and low level CPython API.
  • array can be attached and are release when refcount reach 0 in every processes.
  • a spinlock is used to manage sync between process when bloc are add / removed (this can be improved).

API usage

Here a simple example of how to use library.

import pyarraypool
import multiprocessing
import numpy as np

def task(x, i, value):
    # Define a dummy task than read and write to shared numpy array
    x[i, :, :] = value

def main():
    arr = np.random.random((100, 200, 500))
    I, J, K = arr.shape

    with multiprocessing.Pool(processes=8) as pool:
        # Transfer the array to shared memory.
        #
        # Segment will be created automatically on first `make_transferable` call.
        shmarr = pyarraypool.make_transferable(arr)

        # Apply task to array
        pool.starmap(task, [
            (shmarr, i, i) for i in range(I)
        ])

if __name__ == "__main__":
    main()

You can have a look at notebook / example folders for more details.

Developper guide

To build:

pip install maturin
maturin develop --extras test

To test:

# Run rust tests
cargo test
cargo clippy

# Run python tests
pytest -vv
flake8
autopep8 --diff -r python/
mypy .

To format code:

autopep8 -ir python/
isort .

Project status

Project is currently a "POC" and not fully ready for production.

Few benchmark are still missing. API can be improved and may change in near future.

See TODO.md for more details.

Any help / feedback is welcome 😊 !

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pyarraypool-0.1.3-cp310-cp310-manylinux_2_28_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

pyarraypool-0.1.3-cp39-cp39-manylinux_2_28_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

pyarraypool-0.1.3-cp38-cp38-manylinux_2_28_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.28+ x86-64

File details

Details for the file pyarraypool-0.1.3-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyarraypool-0.1.3-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c31023c335f5aecccd2a7fa8eedcb00e42d1d0b50383b2734622284651bda7c5
MD5 1e9bdd0681f79b64ba34907622121275
BLAKE2b-256 ed4c49b5428dd0288fe441f0c708ee4a925c4c0cddab55948c28f1bc087fb34b

See more details on using hashes here.

File details

Details for the file pyarraypool-0.1.3-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyarraypool-0.1.3-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e1699be3788802c22280e652c0b0542d58495a8e368d8240004140c45b67a4f2
MD5 ff780b9cbceb85a863d362792b2b082f
BLAKE2b-256 a755816f1732783156346dac1e5e286f0ab8c762dd180bff65b4a08555fbc909

See more details on using hashes here.

File details

Details for the file pyarraypool-0.1.3-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyarraypool-0.1.3-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 77742bb531faa3654b7c128c574227f814211f31ad6c8c932aa373418e6b03ef
MD5 33f006dc86e2cb4a02456fa691c19ea2
BLAKE2b-256 5fabbf261c1021a95a3633fbd62fbdf46edf73c49088d4a2e4f179bab16ddcb8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page