Shared-memory streams for NumPy and CUDA-backed PyTorch pipelines.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Jacotay7

These details have not been verified by PyPI

Project links

Documentation

Project description

pyshmem

PyPI | Documentation | Source | Issues

pyshmem provides named shared-memory streams for NumPy arrays and optional CUDA-backed PyTorch pipelines.

It is designed for applications that need a small, predictable API for moving numeric payloads between processes without rebuilding the same locking, metadata, and lifecycle rules around raw shared memory.

Why pyshmem

one API for CPU NumPy buffers and CUDA-backed tensors
cross-process write locking with explicit lock ownership
safe snapshot reads for CPU streams
explicit GPU performance mode or CPU-mirrored compatibility mode
tested lifecycle and recovery behavior across supported platforms

Installation

Install from PyPI:

pip install pyshmem

Optional extras:

pip install pyshmem[test]
pip install pyshmem[gpu]
pip install pyshmem[docs]

For local development from a checkout:

pip install -e .[test]

Quick Start

CPU stream

import numpy as np
import pyshmem

writer = pyshmem.create("demo_frame", shape=(4, 4), dtype=np.float32)
reader = pyshmem.open("demo_frame")

writer.write(np.ones((4, 4), dtype=np.float32))
frame = reader.read()
next_frame = reader.read_new(timeout=1.0)

GPU stream

import numpy as np
import pyshmem

writer = pyshmem.create(
    "demo_cuda",
    shape=(4, 4),
    dtype=np.float32,
    gpu_device="cuda:0",
)
writer.write(np.ones((4, 4), dtype=np.float32))

reader = pyshmem.open("demo_cuda", gpu_device="cuda:0")
frame = reader.read()

Public API

pyshmem.SharedMemory
pyshmem.create(name, *, shape, dtype=np.float32, size=None, gpu_device=None, cpu_mirror=None)
pyshmem.open(name, *, gpu_device=None)
pyshmem.unlink(name)
pyshmem.gpu_available()

SharedMemory instances expose metadata, locking, lifecycle, and IO methods:

name, shape, dtype, size, gpu_device, cpu_mirror, owner
count, write_time, write_sequence
acquire(timeout=None, poll_interval=1e-3)
release()
locked(timeout=None, poll_interval=1e-3)
write(value)
read(safe=True, poll_interval=1e-6)
read_new(timeout=None, safe=True, poll_interval=1e-5)
clear()
close()
unlink()
delete()

Behavior Notes

Writes are serialized with a cross-platform file lock backend.

read(safe=True) returns a consistent snapshot of the most recent completed write
read(safe=False) exposes the live backing storage and therefore requires with shm.locked():
close() releases only the local handle
unlink() destroys the underlying shared-memory stream

Closed handles are guarded explicitly. After close(), methods such as read, write, acquire, clear, and metadata access raise a RuntimeError that instructs the caller to reopen the stream.

Missing segments raise FileNotFoundError with a pyshmem-specific message that points the caller toward pyshmem.create(...).

GPU Modes

GPU-backed streams have two deliberately different operating modes.

Performance mode:

pyshmem.create(..., gpu_device="cuda:N") defaults to cpu_mirror=False
avoids CPU mirror maintenance on every write
optimized for GPU-heavy pipelines where throughput matters most

Compatibility mode:

pyshmem.create(..., gpu_device="cuda:N", cpu_mirror=True) keeps the CPU mirror updated
allows CPU-side payload reads and stronger safe-snapshot semantics under concurrent writes

Important attachment rule:

pass gpu_device="cuda:N" to pyshmem.open(...) whenever the caller needs a CUDA torch.Tensor view
opening a GPU stream without gpu_device still allows metadata inspection and lock management, but payload reads require either a GPU attachment or cpu_mirror=True

Platform Notes

Windows limitation

Windows inherits a hard limitation from multiprocessing.shared_memory: the operating system deletes a shared-memory block as soon as the last handle to it is closed.

That means the following behaviors are unsupported on Windows:

a segment outliving its creator when no other process still has it open
close() followed by pyshmem.open(...) when that close() dropped the final live handle

Those behaviors remain supported on POSIX platforms.

Testing

Install test dependencies and run the CPU suite:

pip install -e .[test]
pytest -m cpu

Run the CUDA suite on a GPU machine:

pip install -e .[test,gpu]
pytest -m gpu

The repository also includes benchmark-marked tests:

pytest -m "cpu and benchmark" -q -s
pytest tests/test_benchmark.py -m "gpu and benchmark" -q -s

GitHub-hosted runners do not provide CUDA by default, so the CUDA workflow is manual and targets either a self-hosted GPU runner or a larger GitHub runner with CUDA support.

Performance

The benchmark suite measures both raw shared-memory IO and matrix-vector multiply pipelines that keep the matrix in shared memory.

Two GPU MVM shapes are covered:

host-upload pipeline: the vector payload is created in NumPy and uploaded each iteration
device-resident pipeline: the vector payload is produced directly on GPU each iteration

The CPU benchmark target remains 50 kHz for a 128x128 round trip. Hard enforcement is opt-in because hosted CI is not a reliable performance lab.

Measured Results

The following numbers were measured on this machine:

OS: Linux 6.17.0-14-generic x86_64
Python: 3.12.0
NumPy: 2.2.6
PyTorch: 2.10.0+cu128
GPU: NVIDIA GeForce RTX 5090

Methodology:

float32 payloads throughout
each benchmark case used warmup iterations before timing
each timed case ran for at least 1.5 seconds to reduce one-off noise
IO throughput is computed from write plus read bytes per iteration
MVM throughput is reported both as pipeline rate and estimated GFLOP/s using $2n^2$ floating-point operations per matrix-vector multiply

Important interpretation note:

GPU-backed segments now default to cpu_mirror=False
the fast GPU path avoids CPU mirror maintenance unless the creator explicitly asks for it with cpu_mirror=True
the stronger concurrent-read consistency contract is provided by the mirrored mode; the default no-mirror mode is optimized for throughput first
the GPU numbers below therefore reflect the optimized no-mirror path, which is the intended performance configuration

IO vs Image Size

Image size	Payload (MiB)	CPU roundtrip Hz	CPU IO (GB/s)	GPU roundtrip Hz	GPU IO (GB/s)
100x100	0.038	180311.2	14.42	36214.1	2.90
1000x1000	3.815	9922.1	79.38	5027.4	40.22
10000x10000	381.470	20.36	16.29	49.96	39.97

Shared-Memory MVM Pipeline

Host-upload GPU pipeline:

Matrix size	Matrix payload (MiB)	CPU pipeline Hz	CPU GFLOP/s	GPU pipeline Hz	GPU GFLOP/s
100x100	0.038	109844.4	2.20	26465.8	0.53
1000x1000	3.815	11124.9	22.25	22485.3	44.97
10000x10000	381.470	26.21	5.24	1299.3	259.86

Fully device-resident GPU pipeline:

Matrix size	Matrix payload (MiB)	GPU pipeline Hz	GPU GFLOP/s
100x100	0.038	30240.6	0.60
1000x1000	3.815	26733.6	53.47
10000x10000	381.470	1321.6	264.33

The updated results show the intended behavior for real GPU workloads:

tiny matrices like 100x100 are still dominated by launch and synchronization overhead, so CPU remains faster there
once the workload is large enough to matter, the no-mirror GPU path pulls ahead decisively
the 1000x1000 and 10000x10000 MVM cases now outperform the CPU equivalents by a wide margin on this machine
keeping the vector generation on GPU improves the pipeline further, especially once the matrix is large enough for the math to dominate

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Jacotay7

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

1.0.2

May 1, 2026

This version

1.0.1

May 1, 2026

1.0.0

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyshmem-1.0.1.tar.gz (34.1 kB view details)

Uploaded May 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyshmem-1.0.1-py3-none-any.whl (25.3 kB view details)

Uploaded May 1, 2026 Python 3

File details

Details for the file pyshmem-1.0.1.tar.gz.

File metadata

Download URL: pyshmem-1.0.1.tar.gz
Upload date: May 1, 2026
Size: 34.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyshmem-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`eb1527868e62aebdde8d2cb526b48c508ce45f51afb80a0ea7c70f85100017fe`
MD5	`3750198350f481d9deba506744d4eddc`
BLAKE2b-256	`340ccd167e90aa4f62fc9f4ef04ed68e384960a364b42b0593d6aa8a8987020e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyshmem-1.0.1.tar.gz:

Publisher: pypi.yml on jacotay7/pyshmem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyshmem-1.0.1.tar.gz
- Subject digest: eb1527868e62aebdde8d2cb526b48c508ce45f51afb80a0ea7c70f85100017fe
- Sigstore transparency entry: 1415932918
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: jacotay7/pyshmem@743d5902bc1809c8e0028e3f83f70cca6581321c
- Branch / Tag: refs/tags/1.0.1
- Owner: https://github.com/jacotay7
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@743d5902bc1809c8e0028e3f83f70cca6581321c
- Trigger Event: release

File details

Details for the file pyshmem-1.0.1-py3-none-any.whl.

File metadata

Download URL: pyshmem-1.0.1-py3-none-any.whl
Upload date: May 1, 2026
Size: 25.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyshmem-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`345ff5f9872e172981c67ea5c4babb7f75cadb9ee3b106e681319b3b0795ca5c`
MD5	`bf1acfaad8f2ae59306d89b9060ae431`
BLAKE2b-256	`de53a88861eebcc744d8a00ce643d7ca8dcbaf92a31065c60be5167ffe67a935`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyshmem-1.0.1-py3-none-any.whl:

Publisher: pypi.yml on jacotay7/pyshmem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyshmem-1.0.1-py3-none-any.whl
- Subject digest: 345ff5f9872e172981c67ea5c4babb7f75cadb9ee3b106e681319b3b0795ca5c
- Sigstore transparency entry: 1415932985
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: jacotay7/pyshmem@743d5902bc1809c8e0028e3f83f70cca6581321c
- Branch / Tag: refs/tags/1.0.1
- Owner: https://github.com/jacotay7
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@743d5902bc1809c8e0028e3f83f70cca6581321c
- Trigger Event: release

pyshmem 1.0.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pyshmem

Why pyshmem

Installation

Quick Start

CPU stream

GPU stream

Public API

Behavior Notes

GPU Modes

Platform Notes

Windows limitation

Testing

Performance

Measured Results

IO vs Image Size

Shared-Memory MVM Pipeline

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance