Skip to main content

managing np arrays stored in the shared memory

Project description

Python package PyPI version

Cute-Shm

Cute-Shm is a convenience wrapper over Python's multiprocessing shared memory. It provides an easy-to-use API for managing shared memory numpy arrays and HDF5 files. Using the shared memory allows to share numpy arrays across multiple processes running on the same node.

Requirements

Python 3.10 or later.

Installation

You can install Cute-Shm using pip:

pip install cute-shm

Usage

API

sharing numpy arrays

import numpy as np
import cute_shm as cute

# Create some numpy arrays
a = np.array([[12, 0, 0], [0, 10, 0], [0, 0, 0]], dtype=np.int64)
b1 = np.zeros(100, dtype=np.float32)
b2 = np.zeros(300, dtype=np.float32)

# Create a nested dictionary of arrays
arrays = {"a": a, "b": {"b1": b1, "b2": b2}}

# An arbitrary name for this projet
project_name = "myproject"

# transfer arrays to shared memory

# set to True if the shared memory should not be cleaned upon exit
# i.e. another process may need to access it later
persistent = False 

# set to True if the shared memory should be overwritten if it already exists
# (if False and a project of the same name already exists, a FileExistsError will be raised)
overwrite = False # set to True if the shared memory should be overwritten if it already exists

# transfer arrays to shared memory
cute.arrays_to_shm(
    project_name,
    arrays,
    persistent=persistent,
    overwrite=overwrite,
)

# reading the arrays from the shared memory
# This could be done in a different process.
# (including processes spawned after this process exits if persistent is True)
shm_arrays = cute.shm_to_arrays(project_name, persistent=persistent)

# shm_arrays has the same structure as arrays.
# Each item has two keys: 
# - "data": the numpy array
# - "meta": related metadata
a: np.ndarray = shm_arrays["a"]["data"]

# meta data consists mostly of things you will certainly not need.
a_meta: cute.SharedArrayMeta = shm_arrays["a"]["meta"]
a_meta["shape"] # the shape of the array, same as a.shape
a_meta["dtype"] # the data type of the array, same as str(a.dtype)
a_meta["shm_name"] # the name of the shared memory segment
a_meta["shm_private_name"] # the private name of the shared memory segment
a_meta["shm"] # the shared memory segment (instance of shared_memory.SharedMemory)

# clean up the shared memory and related metadata
# (do not call this if persistent is True and you want the shared memory to be available for other processes)
cute.unlink(project_name)

You can also use the unlinked_arrays_to_shm context manager to ensure the shared memory and related metadata are cleaned up on exit.

(if persistent is False, the python multiprocessing resource tracker will cleanup the shared memory automatically, but not the meta data).

# Transfer arrays to shared memory
with cute.unlinked_arrays_to_shm(project_name, arrays):

    # Read arrays from shared memory
    # (this could also be done in a different process)
    shm_arrays = cute.shm_to_arrays(project_name)

# Shared memory and meta data is automatically cleaned up when the context manager exits

sharing content of hdf5 files

Content of hdf5 files can also be transferred to shared memory as a dictionary of numpy arrays.

from pathlib import Path
import cute_shm as cute

hdf5_path = Path("path/to/your/file.hdf5")

project_name = "myproject"

# if True, a progress bar showing the progress of the transfer
# to the shared memory will be shown
progress = True
# for persistent and overwrite, same usage as when sharing numpy arrays
persistent = False
overwrite = False

# transfer to the shared memory
hdf5_to_shm(
    hdf5_path, project_name, progress=progress, persistent=persistent, overwrite=overwrite
)

# content of hdf5 is shared as a nested directories of nested arrays 
shm_arrays = cute.shm_to_arrays(project_name, persistent=persistent)

# dataset attributes are stored in the "meta" dictionary
a: np.ndarray = shm_arrays["a"]["data"]
a_meta: cute.SharedArrayMeta = shm_arrays["a"]["meta"]
a_meta["attrs"] # the attributes of the dataset

A context manager is also provided:

with unlinked_hdf5_to_shm(
    hdf5_path, project_name, progress, overwrite
):
    shm_arrays = cute.shm_to_arrays(project_name, persistent=persistent)

Logging

If in your own software using the cute-sh API you set the logging to level DEBUG, information related to the creation/deletion of shared memory segments will be provided.

Typing hints

cute-shm provides and uses these type aliases:

# a nested dictionary of numpy arrays. This is the data structure
# that can be transferred to the shared memory.
ArrayDict: TypeAlias = dict[str, Union["ArrayDict", np.ndarray]]

# usage
arrays: cute_shm.ArrayDict = {"a": np.zeros(10), "b": {"b1": np.zeros(10), "b2": np.zeros(10)}}
cute_shm.arrays_to_shm("myproject", arrays)

# a shared memory array: data and related metadata
class SharedArray(TypedDict):
    meta: SharedArrayMeta
    data: np.ndarray

# the metadata of a shared memory array
class SharedArrayMeta(TypedDict, total=False):
    shm_name: str
    shm_private_name: str
    shm: shared_memory.SharedMemory
    shape: tuple[int, ...]
    dtype: str
    attrs: dict[str, Any]

# a nested dictionary of shared memory arrays.
# This is the data structure that is returned by the API
# when reading the shared memory.
SharedArrayDict: TypeAlias = dict[str, Union[SharedArray, "SharedArrayDict"]]

# usage
shm_arrays: cute_shm.SharedArrayDict = cute_shm.shm_to_arrays("myproject")
a: cute_shm.SharedArray = shm_arrays["a"]
a_data: np.ndarray = a["data"]
a_meta: cute_shm.SharedArrayMeta = a["meta"]

Under the hood

  • when the arrays are transferred to shared memory, a toml file is created in the /tmp/cute-shm directory. Its name is based on the project name.
  • this toml file contains all the metadata required for other processes to "cast" the shared memory to the proper dictionary structure.

If you prefer to store the metadata in a different location, change the root attribute of the class Project2Toml:

import cute_shm as cute

cute.Project2Toml.root = Path("/path/to/your/directory")

Command line executables

To load the content of a hdf5 file to the shared memory via command line:

cute-shm-hdf5 <project_name> <hdf5_path> 

for example:

# transfer to the shared memory.
# file.hdf5 expected in the current directory
cute-shm-hdf5 myproject file.hdf5 

# overwrite if data corresponding to a project named myproject already exists
# 'o' for overwrite
cute-shm-hdf5 myproject file.hdf5 -o

# do not display a progress bar
cute-shm-hdf5 myproject file.hdf5 -no-progress

# display debug information instead of a progress bar
# 'v' for verbose
cute-shm-hdf5 myproject file.hdf5 -v

# any python process can now access the shared memory
# "myproject" via cute.shm_to_arrays.

You can display about data hosted in the shared memory:

# full information
cute-shm-list

# just an overview ('s' for short)
cute-shm-list -s

Note that cute-shm-list will not only display the content of the shared memory created via cute-shm-hdf5, but also the content of the shared memory created via the python API (shared memory currently being transferred will not be listed).

Shared memory can be cleaned up via the command cute-shm-unlink <project_name>:

cute-shm-unlink myproject

"Manual" cleaning of the shared memory

Alternatively to use the API or the command line to free the shared memory, you may either:

  • reboot the computer
  • delete files prefixed by cute-shm in the /dev/shm folder and related toml files in the /tmp/cute-shm folder.

Warnings

Bus error

If the RAM of the computer gets full, transfer to the shared memory will not only fail, the process will also crash with a bus error. This is a system error that cannot be managed by the python exception handling.

Garbage collection of the shared memory

Shared memory numpy arrays buffers is a pointer to the buffer of a related instance of shared_memory.SharedMemory. This related instance needs to be loaded in the heap, i.e. it should not be garbage collected. If it is garbage collected, then a SegmentationFault will occur and the process will crash (not managed by python exception handling).

The instance of the shared_memory.SharedMemory is located in the meta dictionary of the SharedArrayMeta instance.

For example, one should not:

# read the shared memory to a dictionary of numpy arrays and meta data
shm_arrays: cute_shm.SharedArrayDict = cute_shm.shm_to_arrays(project_name)

# access the data and meta data of 'a'
shm_array = shm_arrays["a"]

# the numpy array
data: np.ndarray = shm_array["data"]

# the meta data
meta: cute_shm.SharedArrayMeta = shm_array["meta"]

# deleting the pointer to the shared memory segment
# related to the data
del meta["shm"]

# this will crash: the shared memory segment has been garbage collected
print(data[0]) 

or:

def get_np(project_name: str)->np.ndarray:

    # read the shared memory to a dictionary of numpy arrays and meta data
    shm_arrays: cute_shm.SharedArrayDict = cute_shm.shm_to_arrays(project_name)

    # access the data and meta data of 'a'
    shm_array = shm_arrays["a"]
    data: np.ndarray = shm_array["data"]
    meta: cute_shm.SharedArrayMeta = shm_array["meta"]

    # meta["shm"] is a reference to the shared memory segment.
    # It will be garbage collected, along with the meta dictionary,
    #  when the function exits.
    return data

a: np.ndarray = get_np("myproject")
# this will crash: the shared memory segment has been garbage collected
print(a[0])  

Note: When the shared memory instance is garbage collected:

  • The data is not removed from the shared memory.
  • Only the pointer to the data buffer is lost.
  • This loss of pointer affects only the current process.

Authorship, Copyright, and License

Author: Vincent Berenz
Institution: Max Planck Institute for Intelligent Systems, Tübingen, Germany
Copyright: © 2024 Max Planck Gesellschaft
License: MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cute_shm-0.1.1.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

cute_shm-0.1.1-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file cute_shm-0.1.1.tar.gz.

File metadata

  • Download URL: cute_shm-0.1.1.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.8.10 Linux/5.15.0-117-generic

File hashes

Hashes for cute_shm-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c4e34891ffa9de9b15c2192ada9a1f5ffb597c0d43d01a473f53a5a28beb7f01
MD5 a55a5568b5983d80715211a942e9901c
BLAKE2b-256 478563973d1afc4859b683856a1ed600aa3fa16f9e7e4ebd2d7de9b1aacf0b07

See more details on using hashes here.

File details

Details for the file cute_shm-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: cute_shm-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.8.10 Linux/5.15.0-117-generic

File hashes

Hashes for cute_shm-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9e78a163fc47ff234b47627abb20c5425399d13669c614ff5f6baf944208448e
MD5 f01a85be16da415e17dee455b78fb43c
BLAKE2b-256 9bf53783146c130ee5ccb9b417d45a29cc825f817fc7d4169967532787a8401c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page