Skip to main content

Zero-copy shared memory transfer of large Python objects

Project description

PyPI version Python versions License CI

fastshare

Zero-copy shared memory transfer of large Python objects between processes.

Why fastshare?

Passing large objects between Python processes is slow. The standard approach -- pickle.dumps() through a multiprocessing.Queue or pipe -- copies the data at least twice: once to serialize and once to push through the pipe. For a 100 MB NumPy array, that means 200 MB+ of unnecessary copying on every transfer.

fastshare uses Python 3.8+'s pickle protocol 5 out-of-band buffers combined with shared memory to eliminate those copies. Large buffer-backed objects (NumPy arrays, bytearrays) are placed directly into shared memory and reconstructed on the other side without copying. Small objects fall back to standard pickle automatically.

The result: drop-in write() and read() calls that work with any picklable object, but transfer large arrays in microseconds instead of milliseconds.

Installation

pip install fastshare

With NumPy support (enables zero-copy array transfer):

pip install fastshare[numpy]

Requires Python 3.10+.

Quick Start

Write a large object in one process, read it in another:

# example_quick_start.py
import multiprocessing as mp
from fastshare import write, read


def reader(token):
    """Child process: reconstruct the object from shared memory."""
    data = read(token)
    print(f"Reader got {len(data):,} bytes, first 10: {data[:10]}")
    # Reader got 5,000,000 bytes, first 10: b'HELLOWORLD'


if __name__ == "__main__":
    # Create a 5 MB object
    payload = b"HELLOWORLD" * 500_000

    # Write to shared memory and get a token string
    token = write(payload)

    # Pass the token (a short string) to the child process
    p = mp.Process(target=reader, args=(token,))
    p.start()
    p.join()

The token is a lightweight string like FSHR:shm:FSHR_a1b2c3 -- only the token crosses the process boundary, not the data.

SharedData Broadcast

For the common pattern of sharing one large object with a pool of workers, use the SharedData context manager:

# example_broadcast.py
import multiprocessing as mp
import numpy as np
from fastshare import SharedData


def worker(args):
    """Each worker loads the shared array (cached after first access)."""
    name, idx = args
    arr = SharedData.load(name)
    total = float(arr.sum())
    print(f"Worker {idx}: shape={arr.shape}, sum={total:.0f}")
    return total


if __name__ == "__main__":
    # Create a large array (100 MB)
    data = np.ones((25_000_000,), dtype=np.float32)

    with SharedData(data) as sd:
        # sd.name is the block name to pass to workers
        with mp.Pool(4) as pool:
            results = pool.map(worker, [(sd.name, i) for i in range(4)])

    # Worker 0: shape=(25000000,), sum=25000000
    # Worker 1: shape=(25000000,), sum=25000000
    # Worker 2: shape=(25000000,), sum=25000000
    # Worker 3: shape=(25000000,), sum=25000000
    print(f"All workers returned: {results}")

Each worker gets a zero-copy read-only view of the same shared memory block. The data is serialized once by the parent and deserialized (with zero-copy for NumPy arrays) once per worker process, with subsequent calls to SharedData.load() returning the cached object.

Benchmarks

Single-process write() + read() round-trip, measured with pytest-benchmark on Windows 10 (Python 3.12, 8-core Intel).

Object Size pickle (stdlib) fastshare Ratio
bytes 10 KB 4.5 µs 116 µs 0.04x
bytes 10 MB 7.5 ms 22.2 ms 0.34x
NumPy float32 100 MB 69 ms 45 ms 1.5x
NumPy float32 500 MB 364 ms 231 ms 1.6x
NumPy float32 1 GB 863 ms 488 ms 1.8x

For objects below the 1 MB threshold, fastshare delegates to standard pickle, so the 10 KB row reflects fastshare's size-estimation overhead rather than shared memory performance.

Where fastshare shines: The win grows with object size and when the object supports pickle protocol 5 out-of-band buffers (NumPy arrays, bytearrays). At 100 MB, zero-copy deserialization avoids the full-array copy that pickle.loads() must perform. In multi-process scenarios the advantage compounds -- shared memory avoids the additional pipe-copy overhead that multiprocessing.Queue incurs, and broadcast to N workers amortizes the single write across all readers.

Raw benchmark output: benchmarks/benchmark_results.txt

API Reference

Core Functions

fastshare.write(obj, *, threshold=1_000_000) -> str

Serialize obj and return a fastshare token string. Objects below threshold bytes use pickle fallback; larger objects use shared memory for zero-copy transfer. If shared memory allocation fails, falls back to pickle with a UserWarning.

  • obj -- Any picklable Python object.
  • threshold (int) -- Size in bytes below which pickle fallback is used. Default: 1,000,000 (1 MB).
  • Returns: A "FSHR:"-prefixed token string.
  • Raises: pickle.PicklingError if obj cannot be pickled.
fastshare.read(token, *, readonly=True) -> object

Reconstruct an object from a fastshare token.

  • token (str) -- A "FSHR:"-prefixed token from write().
  • readonly (bool) -- If True (default), NumPy arrays are read-only. Set False to allow mutation.
  • Returns: The reconstructed Python object.
  • Raises: FastShareError if the token is invalid or the shared memory block is missing.

SharedData Class

class fastshare.SharedData(obj)

Write-once broadcast context manager. Use for sharing large objects with multiple worker processes.

  • Context manager: with SharedData(obj) as sd: serializes to shared memory. On exit the block is unlinked.
  • .name (str) -- The FSHR-prefixed block name for passing to workers.
  • .size (int) -- Size of the shared memory block in bytes.
SharedData.load(name) -> object

Load a shared object by block name with per-process caching. Workers call this with the name from the parent.

  • name (str) -- The FSHR-prefixed block name.
  • Returns: The deserialized object (NumPy arrays are read-only).
  • Raises: TypeError if name is not a string, BlockNotFoundError if the block is gone.
SharedData.clear_cache() -> None

Clear the per-process object cache. Call between batches in long-running workers to free memory.

Cleanup

fastshare.cleanup(dry_run=False) -> CleanupResult

Clean up orphaned FSHR-prefixed shared memory blocks. Discovers blocks on the system, skips blocks owned by the calling process, and unlinks the rest. Linux only (other platforms return an empty result).

  • dry_run (bool) -- If True, report without unlinking.
  • Returns: CleanupResult with .cleaned, .failed, .skipped lists.

CLI equivalent:

fastshare cleanup [--dry-run] [--verbose] [--quiet]

Exceptions

  • FastShareError -- Base exception for all fastshare errors.
  • AllocationError(FastShareError) -- Shared memory allocation failed.
  • BlockNotFoundError(FastShareError, KeyError) -- Shared memory block not found by name.

How It Works

fastshare uses Python's pickle protocol 5 out-of-band buffer support combined with multiprocessing.shared_memory. When write() is called on a large object, pickle separates the large data buffers (like NumPy array contents) from the metadata. The buffers are written directly into a shared memory block -- no copies. The metadata (small) is pickled normally and stored as a header.

When read() is called, the metadata is unpickled and the buffers are reconstructed as zero-copy views into the shared memory block.

Process A                          Process B
   |                                  |
   write(obj)                         read(token)
   |                                  |
   pickle5 ──> shared memory ──> unpickle5
   (separate     (zero-copy       (reconstruct
    buffers)      transfer)        with views)

Platform Support

Python 3.10 Python 3.11 Python 3.12 Python 3.13
Linux Yes Yes Yes Yes
macOS Yes Yes Yes Yes
Windows Yes Yes Yes Yes
  • All platforms support shared memory transfer.
  • The cleanup command (orphan block discovery) only works on Linux (/dev/shm scanning).
  • The fork start method is not available on Windows; spawn works everywhere.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastshare-1.0.0.tar.gz (301.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastshare-1.0.0-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file fastshare-1.0.0.tar.gz.

File metadata

  • Download URL: fastshare-1.0.0.tar.gz
  • Upload date:
  • Size: 301.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for fastshare-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3a1ebebdaa48959841878d51aa82d9bc56b6b315651882387ca25a971dce45f4
MD5 86924900515350639b379adaf7ed6797
BLAKE2b-256 5d9623733fa127173315ec554f0e5e1247b737b50c2c9793419cb602e54477f9

See more details on using hashes here.

File details

Details for the file fastshare-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: fastshare-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for fastshare-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9bdb19cfcbf09184c3ab1696226cf97aee01ae792ea4677cd4ec127276f1337c
MD5 e98d1e70126a9db40a9f6fb8c8625a29
BLAKE2b-256 b1e041cc28d70891eba62164d3f799e4452a2f3f87c1e0520243235b95a3b322

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page