managing np arrays stored in the shared memory
Project description
cute-shm
cute-shm is a convenience wrapper over Python's multiprocessing shared memory. It provides an easy-to-use API for managing shared memory numpy arrays and HDF5 files. Using the shared memory allows to share numpy arrays across multiple processes running on the same node.
Table of Contents
Requirements
Python 3.10 or later.
Installation
You can install cute-shm using pip:
pip install cute-shm
Usage
API
sharing numpy arrays
import numpy as np
import cute_shm as cute
# Create some numpy arrays
a = np.array([[12, 0, 0], [0, 10, 0], [0, 0, 0]], dtype=np.int64)
b1 = np.zeros(100, dtype=np.float32)
b2 = np.zeros(300, dtype=np.float32)
# Create a nested dictionary of arrays
arrays = {"a": a, "b": {"b1": b1, "b2": b2}}
# An arbitrary name for this projet
project_name = "myproject"
# set to True if the shared memory should not be cleaned upon exit
# i.e. another process may need to access it later
persistent = False
# set to True if the shared memory should be overwritten if it already exists
# (if False and a project of the same name already exists, a FileExistsError will be raised)
overwrite = False # set to True if the shared memory should be overwritten if it already exists
# transfer arrays to shared memory
cute.arrays_to_shm(
project_name,
arrays,
persistent=persistent,
overwrite=overwrite,
)
# reading the arrays from the shared memory
# This could be done in a different process.
# (including processes spawned after this process exits if persistent is True)
shm_arrays = cute.shm_to_arrays(project_name, persistent=persistent)
# shm_arrays has the same structure as arrays.
# Each item has two keys:
# - "data": the numpy array
# - "meta": related metadata
a: np.ndarray = shm_arrays["a"]["data"]
# note: 'a' is not a copy of the array located in the shared memory.
# It is a numpy array which data buffer points to the shared memory.
# meta data consists mostly of things you will certainly not need.
a_meta: cute.SharedArrayMeta = shm_arrays["a"]["meta"]
a_meta["shape"] # the shape of the array, same as a.shape
a_meta["dtype"] # the data type of the array, same as str(a.dtype)
a_meta["shm_name"] # the name of the shared memory segment
a_meta["shm_private_name"] # the private name of the shared memory segment
a_meta["shm"] # the shared memory segment (instance of shared_memory.SharedMemory)
# clean up the shared memory and related metadata
# (do not call this if persistent is True and you want the shared memory to be available for other processes)
cute.unlink(project_name)
You can also use the unlinked_arrays_to_shm
context manager to
ensure the shared memory and related metadata are cleaned up on exit.
(if persistent is False, the python multiprocessing resource tracker will cleanup the shared memory automatically, but not the meta data).
# Transfer arrays to shared memory
with cute.unlinked_arrays_to_shm(project_name, arrays):
# Read arrays from shared memory
# (this could also be done in a different process)
shm_arrays = cute.shm_to_arrays(project_name)
# Shared memory and meta data is automatically cleaned up when the context manager exits
sharing content of hdf5 files
Content of hdf5 files can also be transferred to shared memory as a dictionary of numpy arrays.
from pathlib import Path
import cute_shm as cute
hdf5_path = Path("path/to/your/file.hdf5")
project_name = "myproject"
# if True, a progress bar showing the progress of the transfer
# to the shared memory will be shown
progress = True
# for persistent and overwrite, same usage as when sharing numpy arrays
persistent = False
overwrite = False
# transfer to the shared memory
hdf5_to_shm(
hdf5_path, project_name, progress=progress, persistent=persistent, overwrite=overwrite
)
# content of hdf5 is shared as a nested directories of nested arrays
shm_arrays = cute.shm_to_arrays(project_name, persistent=persistent)
# dataset attributes are stored in the "meta" dictionary
a: np.ndarray = shm_arrays["a"]["data"]
a_meta: cute.SharedArrayMeta = shm_arrays["a"]["meta"]
a_meta["attrs"] # the attributes of the dataset
A context manager is also provided:
with unlinked_hdf5_to_shm(
hdf5_path, project_name, progress, overwrite
):
shm_arrays = cute.shm_to_arrays(project_name, persistent=persistent)
Logging
If in your own software using the cute-sh API you set the logging to level DEBUG
, information related to
the creation/deletion of shared memory segments will be provided.
Typing hints
cute-shm provides and uses these type aliases:
# a nested dictionary of numpy arrays. This is the data structure
# that can be transferred to the shared memory.
ArrayDict: TypeAlias = dict[str, Union["ArrayDict", np.ndarray]]
# usage
arrays: cute_shm.ArrayDict = {"a": np.zeros(10), "b": {"b1": np.zeros(10), "b2": np.zeros(10)}}
cute_shm.arrays_to_shm("myproject", arrays)
# a shared memory array: data and related metadata
class SharedArray(TypedDict):
meta: SharedArrayMeta
data: np.ndarray
# the metadata of a shared memory array
class SharedArrayMeta(TypedDict, total=False):
shm_name: str
shm_private_name: str
shm: shared_memory.SharedMemory
shape: tuple[int, ...]
dtype: str
attrs: dict[str, Any]
# a nested dictionary of shared memory arrays.
# This is the data structure that is returned by the API
# when reading the shared memory.
SharedArrayDict: TypeAlias = dict[str, Union[SharedArray, "SharedArrayDict"]]
# usage
shm_arrays: cute_shm.SharedArrayDict = cute_shm.shm_to_arrays("myproject")
a: cute_shm.SharedArray = shm_arrays["a"]
a_data: np.ndarray = a["data"]
a_meta: cute_shm.SharedArrayMeta = a["meta"]
Concurrent access
Once numpy arrays are transferred to the shared memory and no longer updated, they can be accessed by multiple processes concurrently without lock protection.
If a process updates the values of the arrays, locking should be implemented using
either the multiprocessing.Lock
or a filelock
.
See for example the demo_server.py
and the demo_client.py
demos here.
Under the hood
- when the arrays are transferred to shared memory, a toml file is created in the
/tmp/cute-shm
directory. Its name is based on the project name. - this toml file contains all the metadata required for other processes to "cast" the shared memory to the proper dictionary structure.
If you prefer to store the metadata in a different location, change the root
attribute of the class Project2Toml
:
import cute_shm as cute
cute.Project2Toml.root = Path("/path/to/your/directory")
Command line executables
To load the content of a hdf5 file to the shared memory via command line:
cute-shm-hdf5 <project_name> <hdf5_path>
for example:
# transfer to the shared memory.
# file.hdf5 expected in the current directory
cute-shm-hdf5 myproject file.hdf5
# overwrite if data corresponding to a project named myproject already exists
# 'o' for overwrite
cute-shm-hdf5 myproject file.hdf5 -o
# do not display a progress bar
cute-shm-hdf5 myproject file.hdf5 -no-progress
# display debug information instead of a progress bar
# 'v' for verbose
cute-shm-hdf5 myproject file.hdf5 -v
# any python process can now access the shared memory
# "myproject" via cute.shm_to_arrays.
You can display about data hosted in the shared memory:
# full information
cute-shm-list
# just an overview ('s' for short)
cute-shm-list -s
Note that cute-shm-list
will not only display the content of the shared memory created via cute-shm-hdf5
, but also the content of
the shared memory created via the python API (shared memory currently being transferred will not be listed).
Shared memory can be cleaned up via the command cute-shm-unlink <project_name>
:
cute-shm-unlink myproject
"Manual" cleaning of the shared memory
Alternatively to use the API or the command line to free the shared memory, you may either:
- reboot the computer
- delete files prefixed by
cute-shm
in the/dev/shm
folder and related toml files in the/tmp/cute-shm
folder.
Demos
For examples: demos.
Warnings
Bus error (hitting RAM limits)
If the RAM of the computer gets full, transfer to the shared memory will not only fail, the process will also crash with a bus error. This is a system error that cannot be managed by the python exception handling.
It has also been observed that the process becomes stuck when the RAM limit is exceeded.
Garbage collection of the shared memory
Shared memory numpy arrays buffers is a pointer to the buffer of a related instance of shared_memory.SharedMemory
.
This related instance needs to be loaded in the heap, i.e. it should not be garbage collected. If it is garbage collected,
then a SegmentationFault
will occur and the process will crash (not managed by python exception handling).
The instance of the shared_memory.SharedMemory
is located in the meta
dictionary of the SharedArrayMeta
instance.
For example, one should not:
# read the shared memory to a dictionary of numpy arrays and meta data
shm_arrays: cute_shm.SharedArrayDict = cute_shm.shm_to_arrays(project_name)
# access the data and meta data of 'a'
shm_array = shm_arrays["a"]
# the numpy array
data: np.ndarray = shm_array["data"]
# the meta data
meta: cute_shm.SharedArrayMeta = shm_array["meta"]
# deleting the pointer to the shared memory segment
# related to the data
del meta["shm"]
# this will crash: the shared memory segment has been garbage collected
print(data[0])
or:
def get_np(project_name: str)->np.ndarray:
# read the shared memory to a dictionary of numpy arrays and meta data
shm_arrays: cute_shm.SharedArrayDict = cute_shm.shm_to_arrays(project_name)
# access the data and meta data of 'a'
shm_array = shm_arrays["a"]
data: np.ndarray = shm_array["data"]
meta: cute_shm.SharedArrayMeta = shm_array["meta"]
# meta["shm"] is a reference to the shared memory segment.
# It will be garbage collected, along with the meta dictionary,
# when the function exits.
return data
a: np.ndarray = get_np("myproject")
# this will crash: the shared memory segment has been garbage collected
print(a[0])
Note: When the shared memory instance is garbage collected:
- The data is not removed from the shared memory.
- Only the pointer to the data buffer is lost.
- This loss of pointer affects only the current process.
Authorship, Copyright, and License
Author: Vincent Berenz
Institution: Max Planck Institute for Intelligent Systems, Tübingen, Germany
Copyright: © 2024 Max Planck Gesellschaft
License: MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cute_shm-1.2.tar.gz
.
File metadata
- Download URL: cute_shm-1.2.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.8.10 Linux/5.15.0-117-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34a8cd3d257264548f243fc758c19cbee54a227a253080767fecd3940f1147d8 |
|
MD5 | 98d77e406d802355c20902f0e853923d |
|
BLAKE2b-256 | 3dceb6e952ccb2f9253394ccc6474f1de93ff391b49296865f6545a40a860d47 |
File details
Details for the file cute_shm-1.2-py3-none-any.whl
.
File metadata
- Download URL: cute_shm-1.2-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.8.10 Linux/5.15.0-117-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f40eb95eb227afa75a218c0a54ea71c56cf4c169608c6e21ce908575383b4dd0 |
|
MD5 | 57be2ff961b325068b8f176edb00f033 |
|
BLAKE2b-256 | 600b76054013d98f0a1e332447039ce802bb8ef94ca8b68c0754cec320f757ed |