Shared-memory streams for NumPy and CUDA-backed PyTorch pipelines.
Project description
pyshmem
PyPI | Documentation | Source | Issues
pyshmem provides named shared-memory streams for NumPy arrays and optional CUDA-backed PyTorch pipelines.
It is designed for applications that need a small, predictable API for moving numeric payloads between processes without rebuilding the same locking, metadata, and lifecycle rules around raw shared memory.
Why pyshmem
- one API for CPU NumPy buffers and CUDA-backed tensors
- cross-process write locking with explicit lock ownership
- safe snapshot reads for CPU streams
- explicit GPU performance mode or CPU-mirrored compatibility mode
- tested lifecycle and recovery behavior across supported platforms
Installation
Install from PyPI:
pip install pyshmem
Optional extras:
pip install pyshmem[test]
pip install pyshmem[gpu]
pip install pyshmem[docs]
For local development from a checkout:
pip install -e .[test]
Quick Start
CPU stream
import numpy as np
import pyshmem
writer = pyshmem.create("demo_frame", shape=(4, 4), dtype=np.float32)
reader = pyshmem.open("demo_frame")
writer.write(np.ones((4, 4), dtype=np.float32))
frame = reader.read()
next_frame = reader.read_new(timeout=1.0)
GPU stream
import numpy as np
import pyshmem
writer = pyshmem.create(
"demo_cuda",
shape=(4, 4),
dtype=np.float32,
gpu_device="cuda:0",
)
writer.write(np.ones((4, 4), dtype=np.float32))
reader = pyshmem.open("demo_cuda", gpu_device="cuda:0")
frame = reader.read()
Public API
pyshmem.SharedMemorypyshmem.create(name, *, shape, dtype=np.float32, size=None, gpu_device=None, cpu_mirror=None)pyshmem.open(name, *, gpu_device=None)pyshmem.unlink(name)pyshmem.gpu_available()
SharedMemory instances expose metadata, locking, lifecycle, and IO methods:
name,shape,dtype,size,gpu_device,cpu_mirror,ownercount,write_time,write_sequenceacquire(timeout=None, poll_interval=1e-3)release()locked(timeout=None, poll_interval=1e-3)write(value)read(safe=True, poll_interval=1e-6)read_new(timeout=None, safe=True, poll_interval=1e-5)clear()close()unlink()delete()
Behavior Notes
Writes are serialized with a cross-platform file lock backend.
read(safe=True)returns a consistent snapshot of the most recent completed writeread(safe=False)exposes the live backing storage and therefore requireswith shm.locked():close()releases only the local handleunlink()destroys the underlying shared-memory stream
Closed handles are guarded explicitly. After close(), methods such as
read, write, acquire, clear, and metadata access raise a RuntimeError
that instructs the caller to reopen the stream.
Missing segments raise FileNotFoundError with a pyshmem-specific message that
points the caller toward pyshmem.create(...).
GPU Modes
GPU-backed streams have two deliberately different operating modes.
Performance mode:
pyshmem.create(..., gpu_device="cuda:N")defaults tocpu_mirror=False- avoids CPU mirror maintenance on every write
- optimized for GPU-heavy pipelines where throughput matters most
Compatibility mode:
pyshmem.create(..., gpu_device="cuda:N", cpu_mirror=True)keeps the CPU mirror updated- allows CPU-side payload reads and stronger safe-snapshot semantics under concurrent writes
Important attachment rule:
- pass
gpu_device="cuda:N"topyshmem.open(...)whenever the caller needs a CUDAtorch.Tensorview - opening a GPU stream without
gpu_devicestill allows metadata inspection and lock management, but payload reads require either a GPU attachment orcpu_mirror=True
Platform Notes
Windows limitation
Windows inherits a hard limitation from multiprocessing.shared_memory: the
operating system deletes a shared-memory block as soon as the last handle to it
is closed.
That means the following behaviors are unsupported on Windows:
- a segment outliving its creator when no other process still has it open
close()followed bypyshmem.open(...)when thatclose()dropped the final live handle
Those behaviors remain supported on POSIX platforms.
Testing
Install test dependencies and run the CPU suite:
pip install -e .[test]
pytest -m cpu
Run the CUDA suite on a GPU machine:
pip install -e .[test,gpu]
pytest -m gpu
The repository also includes benchmark-marked tests:
pytest -m "cpu and benchmark" -q -s
pytest tests/test_benchmark.py -m "gpu and benchmark" -q -s
GitHub-hosted runners do not provide CUDA by default, so the CUDA workflow is manual and targets either a self-hosted GPU runner or a larger GitHub runner with CUDA support.
Performance
The benchmark suite measures both raw shared-memory IO and matrix-vector multiply pipelines that keep the matrix in shared memory.
Two GPU MVM shapes are covered:
- host-upload pipeline: the vector payload is created in NumPy and uploaded each iteration
- device-resident pipeline: the vector payload is produced directly on GPU each iteration
The CPU benchmark target remains 50 kHz for a 128x128 round trip. Hard
enforcement is opt-in because hosted CI is not a reliable performance lab.
Measured Results
The following numbers were measured on this machine:
- OS: Linux 6.17.0-14-generic x86_64
- Python: 3.12.0
- NumPy: 2.2.6
- PyTorch: 2.10.0+cu128
- GPU: NVIDIA GeForce RTX 5090
Methodology:
float32payloads throughout- each benchmark case used warmup iterations before timing
- each timed case ran for at least 1.5 seconds to reduce one-off noise
- IO throughput is computed from
writeplusreadbytes per iteration - MVM throughput is reported both as pipeline rate and estimated GFLOP/s using $2n^2$ floating-point operations per matrix-vector multiply
Important interpretation note:
- GPU-backed segments now default to
cpu_mirror=False - the fast GPU path avoids CPU mirror maintenance unless the creator explicitly asks for it with
cpu_mirror=True - the stronger concurrent-read consistency contract is provided by the mirrored mode; the default no-mirror mode is optimized for throughput first
- the GPU numbers below therefore reflect the optimized no-mirror path, which is the intended performance configuration
IO vs Image Size
| Image size | Payload (MiB) | CPU roundtrip Hz | CPU IO (GB/s) | GPU roundtrip Hz | GPU IO (GB/s) |
|---|---|---|---|---|---|
| 100x100 | 0.038 | 180311.2 | 14.42 | 36214.1 | 2.90 |
| 1000x1000 | 3.815 | 9922.1 | 79.38 | 5027.4 | 40.22 |
| 10000x10000 | 381.470 | 20.36 | 16.29 | 49.96 | 39.97 |
Shared-Memory MVM Pipeline
Host-upload GPU pipeline:
| Matrix size | Matrix payload (MiB) | CPU pipeline Hz | CPU GFLOP/s | GPU pipeline Hz | GPU GFLOP/s |
|---|---|---|---|---|---|
| 100x100 | 0.038 | 109844.4 | 2.20 | 26465.8 | 0.53 |
| 1000x1000 | 3.815 | 11124.9 | 22.25 | 22485.3 | 44.97 |
| 10000x10000 | 381.470 | 26.21 | 5.24 | 1299.3 | 259.86 |
Fully device-resident GPU pipeline:
| Matrix size | Matrix payload (MiB) | GPU pipeline Hz | GPU GFLOP/s |
|---|---|---|---|
| 100x100 | 0.038 | 30240.6 | 0.60 |
| 1000x1000 | 3.815 | 26733.6 | 53.47 |
| 10000x10000 | 381.470 | 1321.6 | 264.33 |
The updated results show the intended behavior for real GPU workloads:
- tiny matrices like
100x100are still dominated by launch and synchronization overhead, so CPU remains faster there - once the workload is large enough to matter, the no-mirror GPU path pulls ahead decisively
- the
1000x1000and10000x10000MVM cases now outperform the CPU equivalents by a wide margin on this machine - keeping the vector generation on GPU improves the pipeline further, especially once the matrix is large enough for the math to dominate
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyshmem-1.0.1.tar.gz.
File metadata
- Download URL: pyshmem-1.0.1.tar.gz
- Upload date:
- Size: 34.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb1527868e62aebdde8d2cb526b48c508ce45f51afb80a0ea7c70f85100017fe
|
|
| MD5 |
3750198350f481d9deba506744d4eddc
|
|
| BLAKE2b-256 |
340ccd167e90aa4f62fc9f4ef04ed68e384960a364b42b0593d6aa8a8987020e
|
Provenance
The following attestation bundles were made for pyshmem-1.0.1.tar.gz:
Publisher:
pypi.yml on jacotay7/pyshmem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyshmem-1.0.1.tar.gz -
Subject digest:
eb1527868e62aebdde8d2cb526b48c508ce45f51afb80a0ea7c70f85100017fe - Sigstore transparency entry: 1415932918
- Sigstore integration time:
-
Permalink:
jacotay7/pyshmem@743d5902bc1809c8e0028e3f83f70cca6581321c -
Branch / Tag:
refs/tags/1.0.1 - Owner: https://github.com/jacotay7
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@743d5902bc1809c8e0028e3f83f70cca6581321c -
Trigger Event:
release
-
Statement type:
File details
Details for the file pyshmem-1.0.1-py3-none-any.whl.
File metadata
- Download URL: pyshmem-1.0.1-py3-none-any.whl
- Upload date:
- Size: 25.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
345ff5f9872e172981c67ea5c4babb7f75cadb9ee3b106e681319b3b0795ca5c
|
|
| MD5 |
bf1acfaad8f2ae59306d89b9060ae431
|
|
| BLAKE2b-256 |
de53a88861eebcc744d8a00ce643d7ca8dcbaf92a31065c60be5167ffe67a935
|
Provenance
The following attestation bundles were made for pyshmem-1.0.1-py3-none-any.whl:
Publisher:
pypi.yml on jacotay7/pyshmem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyshmem-1.0.1-py3-none-any.whl -
Subject digest:
345ff5f9872e172981c67ea5c4babb7f75cadb9ee3b106e681319b3b0795ca5c - Sigstore transparency entry: 1415932985
- Sigstore integration time:
-
Permalink:
jacotay7/pyshmem@743d5902bc1809c8e0028e3f83f70cca6581321c -
Branch / Tag:
refs/tags/1.0.1 - Owner: https://github.com/jacotay7
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@743d5902bc1809c8e0028e3f83f70cca6581321c -
Trigger Event:
release
-
Statement type: