Skip to main content

Manage pool of array using shared memory

Project description

Pyarraypool

Licence Python build Rust build

Transfer numpy array between processes using shared memory.

Why creating this project ?

This library aims to speed up parallel data processing with python and numpy NDArray.

Python GIL does not permit to use multithreading for parallel data processing. It is indeed release when C code / Cython / IO tasks are done but it is still lock for computation tasks.

Alternative to subprocess worker exists but they are not always possible to use. To list few of them:

  • numba
  • switching from cpython to pypy
  • rewrite code using C / Cython / Rust

Few design choices

Python standard library already contains a module to create and manage shared memory.

However it does not permit to manage it as a raw bloc. So performances drop because several system call must be done on each bloc creation / deletion.

In this library:

  • shared memory is manage as a "pool".
  • array can be attached and are release when refcount reach 0 in every processes.
  • a spinlock is used to manage sync between process when bloc are add / removed (this can be improved).

API usage

Here a simple example of how to use library.

import pyarraypool
import multiprocessing
import numpy as np

def task(x, i, value):
    # Define a dummy task
    x[i, :, :] = value

def main():
    arr = np.random.random((100, 200, 500))
    I, J, K = arr.shape

    with multiprocessing.get_context("spawn").Pool(processes=8, initializer=pyarraypool.start_pool) as pool, \
            pyarraypool.object_pool():
        # Transfer the array to shared memory
        shmarr = pyarraypool.make_transferable(arr)

        # Apply task to array
        pool.starmap(task, [
            (shmarr, i, i) for i in range(I)
        ])

if __name__ == "__main__":
    main()

You can have a look at notebook / example folders for more details.

Developper guide

To build:

pip install maturin
maturin develop --extras test

To test:

# Run rust tests
cargo test
cargo clippy

# Run python tests
pytest -vv
flake8
autopep8 --diff -r python/
mypy .

To format code:

autopep8 -ir python/
isort .

Project status

Project is currently a "POC" and not fully ready for production.

Few benchmark are still missing. API can be improved.

See TODO.md for more details.

Any help / feedback is welcome 😊 !

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pyarraypool-0.1.2-cp310-cp310-manylinux_2_28_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

pyarraypool-0.1.2-cp39-cp39-manylinux_2_28_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

pyarraypool-0.1.2-cp38-cp38-manylinux_2_28_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.28+ x86-64

File details

Details for the file pyarraypool-0.1.2-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyarraypool-0.1.2-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d3157e55f4596d1e4d0b091babceaae69eacde9f7722d63e315e7709d0f64224
MD5 ac1fb1cf3a1e4934e75405f1144fd098
BLAKE2b-256 dc68a482ed444d9907158901de5f1adf2100664eff5e07390ce77c97bef1240f

See more details on using hashes here.

File details

Details for the file pyarraypool-0.1.2-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyarraypool-0.1.2-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8d5396078799676bd358b610be02cff214a36694a64bd8031ae7c77686e0cc20
MD5 518470257118408af2fc4f28240059f9
BLAKE2b-256 7245a78a07d525f34a5625158f927bd9190609badec7cb35b606471db9583b56

See more details on using hashes here.

File details

Details for the file pyarraypool-0.1.2-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyarraypool-0.1.2-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 04190987ee4038498baf4300998f794b360322548aab4d26ea4aa53864ca4948
MD5 7b4e5ba2a9f1f46bbb60cb47a8a9ac94
BLAKE2b-256 0048cb62da5babb48a645d7510545a5f341c1c0c188e6b16dfb594d9923a44e4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page