Manage pool of array using shared memory
Project description
Pyarraypool
Transfer numpy array between processes using shared memory.
Why creating this project ?
This library aims to speed up parallel data processing with CPython and numpy NDArray.
What is the issue with regular python multitasking ?
Python GIL does not permit to use multithreading for parallel data processing. It is indeed release when C code / Cython (nogil) / IO tasks are done but it is still lock for computation tasks. This is why subprocess are often used to have multiple processing task done.
Alternative to subprocess worker exists but they are not always possible to use. To list few of them:
Why not using numpy builtin mmap ?
Numpy builtin memory mapping is made to manage a single numpy array. It is not made to manage multiple "small" array that are frequently created / destroy.
Few design choices
Python standard library already contains a module to create and manage shared memory.
However it does not permit to manage it as a RAW bloc safely and easily. So performances drop because several system call must be done on each bloc creation / deletion.
In this library:
- shared memory is manage as a "pool" using Rust and low level CPython API.
- array can be attached and are release when refcount reach 0 in every processes.
- a spinlock is used to manage sync between process when bloc are add / removed (this can be improved).
API usage
Here a simple example of how to use library.
import pyarraypool
import multiprocessing
import numpy as np
def task(x, i, value):
# Define a dummy task than read and write to shared numpy array
x[i, :, :] = value
def main():
arr = np.random.random((100, 200, 500))
I, J, K = arr.shape
with multiprocessing.Pool(processes=8) as pool:
# Transfer the array to shared memory.
#
# Segment will be created automatically on first `make_transferable` call.
shmarr = pyarraypool.make_transferable(arr)
# Apply task to array
pool.starmap(task, [
(shmarr, i, i) for i in range(I)
])
if __name__ == "__main__":
main()
You can have a look at notebook
/ example
folders for more details.
Developper guide
To build:
pip install maturin
maturin develop --extras test
To test:
# Run rust tests
cargo test
cargo clippy
# Run python tests
pytest -vv
flake8
autopep8 --diff -r python/
mypy .
To format code:
autopep8 -ir python/
isort .
Project status
Project is currently a "POC" and not fully ready for production.
Few benchmark are still missing. API can be improved and may change in near future.
See TODO.md
for more details.
Any help / feedback is welcome 😊 !
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file pyarraypool-0.1.3-cp310-cp310-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: pyarraypool-0.1.3-cp310-cp310-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 1.0 MB
- Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c31023c335f5aecccd2a7fa8eedcb00e42d1d0b50383b2734622284651bda7c5 |
|
MD5 | 1e9bdd0681f79b64ba34907622121275 |
|
BLAKE2b-256 | ed4c49b5428dd0288fe441f0c708ee4a925c4c0cddab55948c28f1bc087fb34b |
File details
Details for the file pyarraypool-0.1.3-cp39-cp39-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: pyarraypool-0.1.3-cp39-cp39-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 1.0 MB
- Tags: CPython 3.9, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1699be3788802c22280e652c0b0542d58495a8e368d8240004140c45b67a4f2 |
|
MD5 | ff780b9cbceb85a863d362792b2b082f |
|
BLAKE2b-256 | a755816f1732783156346dac1e5e286f0ab8c762dd180bff65b4a08555fbc909 |
File details
Details for the file pyarraypool-0.1.3-cp38-cp38-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: pyarraypool-0.1.3-cp38-cp38-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 1.0 MB
- Tags: CPython 3.8, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77742bb531faa3654b7c128c574227f814211f31ad6c8c932aa373418e6b03ef |
|
MD5 | 33f006dc86e2cb4a02456fa691c19ea2 |
|
BLAKE2b-256 | 5fabbf261c1021a95a3633fbd62fbdf46edf73c49088d4a2e4f179bab16ddcb8 |