Native PyTorch CUDA IPC over Unix Domain Socket for same-host process separation

These details have not been verified by PyPI

Project links

Project description

Shared Tensor

shared_tensor is a narrow library for one job: sharing CUDA torch.Tensor and CUDA torch.nn.Module objects across processes on the same host and the same GPU with native PyTorch IPC semantics.

The control plane is a local Unix Domain Socket RPC channel. The data plane is native torch CUDA IPC serialization. CPU fallback is intentionally out of scope.

Scope

Supported:

same-host trusted processes
same-GPU CUDA tensors and modules
explicit endpoint registration
sync call and task-backed submit
managed object handles with explicit release
server-side caching, cache_format_key, and singleflight
zero-branch auto mode gated by SHARED_TENSOR_ENABLED=1

Not supported:

CPU tensor or CPU module transport
generic Python object RPC
cross-host transport
mps
implicit device migration

Install

Use Python 3.10+ and a CUDA-enabled PyTorch build.

pip install shared-tensor

For local development:

conda create -y -n shared-tensor-dev python=3.11
conda activate shared-tensor-dev
pip install -e ".[dev,test]"

Example: Same Code, Two Processes

See examples/zero_branch_env.py.

import torch

from shared_tensor import SharedObjectHandle, SharedTensorProvider

provider = SharedTensorProvider()


@provider.share(
    execution="task",
    managed=True,
    concurrency="serialized",
    cache_format_key="model:{hidden_size}",
)
def load_model(hidden_size: int = 4) -> torch.nn.Module:
    return torch.nn.Linear(hidden_size, 2, device="cuda")


x = torch.ones(1, 4, device="cuda")
result = load_model(hidden_size=4)
if isinstance(result, SharedObjectHandle):
    with result as handle:
        y = handle.value(x)
else:
    y = result(x)

Server process:

SHARED_TENSOR_ENABLED=1 SHARED_TENSOR_ROLE=server python demo.py

Client process with the exact same file:

SHARED_TENSOR_ENABLED=1 python demo.py

What changes is only the environment:

same code

server process                      client process
------------------------------      ------------------------------
provider auto-starts UDS daemon     provider builds client wrappers
shared function runs locally        shared function becomes RPC call
CUDA object stays on same GPU       CUDA object is reopened via torch IPC

Example: Reusable Model Registry

See examples/model_service.py.

@provider.share(
    execution="task",
    managed=True,
    concurrency="serialized",
    cache_format_key="model:{input_dim}:{output_dim}",
)
def load_linear_model(input_dim: int = 16, output_dim: int = 4) -> torch.nn.Module:
    ...

Recommended settings for expensive reusable models:

execution="task"
managed=True
concurrency="serialized"
singleflight=True
explicit cache_format_key

This gives one build per cache key, shared handles for identical requests, and explicit release semantics. Task submission uses the same server-side cache as sync call: repeated submit for the same cache key reuses the cached result instead of rebuilding the CUDA object.

Example: Direct Tensor Path

See examples/basic_service.py.

@provider.share(execution="direct", cache=False)
def echo_tensor(tensor: torch.Tensor) -> torch.Tensor:
    return tensor

Use this for short-lived request-scoped CUDA transforms. The main production path is still task-backed model construction.

Configuration

SharedTensorProvider() defaults to safe local mode unless shared-tensor behavior is explicitly enabled.

Environment gate:

export SHARED_TENSOR_ENABLED=1

Per-provider override:

SharedTensorProvider(enabled=True)
SharedTensorProvider(enabled=False)
SharedTensorProvider(enabled=None)

Provider runtime controls:

SharedTensorProvider(server_process_start_method="fork")
SharedTensorProvider(server_startup_timeout=30.0)
provider.get_runtime_info()

Use server_process_start_method="fork" when you explicitly want POSIX fork behavior. Leave it as None to let the library choose a safer default for the current entrypoint.

execution_mode="auto" behaves as follows:

disabled: local mode
enabled + SHARED_TENSOR_ROLE=server: auto-start local server and execute endpoints locally
enabled + role unset: build client wrappers

Socket selection is per CUDA device:

base path comes from SHARED_TENSOR_BASE_PATH or /tmp/shared-tensor
runtime socket path is <base_path>-<device_index>.sock
device_index=None means probe lazily from the current CUDA device when needed

Payload Contract

Allowed result payloads:

CUDA torch.Tensor
CUDA torch.nn.Module

Allowed call payloads:

CUDA tensors and modules
scalar control values in args and kwargs
tuple, list, and dict[str, ...] wrappers
empty args and kwargs through the control path

Rejected:

CPU tensors or modules
plain Python result payloads
mps

Managed Objects

When managed=True, the client receives a SharedObjectHandle.

handle = load_model(hidden_size=4096)
with handle as model_handle:
    y = model_handle.value(x)

You can also release explicitly:

handle.release()

Use managed mode for cached models or other reusable long-lived CUDA objects.

Runtime Introspection

client.get_server_info() now returns readiness and process metadata in addition to endpoint and capability data. In client mode, provider.get_runtime_info() wraps that into a provider-oriented view.

info = provider.get_runtime_info()
# execution_mode, server_socket_path, server_running, server_ready, server_info...

Testing

Default suite:

python -m pytest -m "not gpu"

GPU suite:

python -m pytest -m gpu

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.16

Mar 28, 2026

0.2.15

Mar 28, 2026

0.2.13

Mar 27, 2026

0.2.12

Mar 27, 2026

0.2.11

Mar 27, 2026

0.2.10

Mar 26, 2026

0.2.9

Mar 26, 2026

0.2.8

Mar 26, 2026

0.2.7

Mar 26, 2026

0.2.6

Mar 25, 2026

This version

0.2.5

Mar 25, 2026

0.2.4

Mar 25, 2026

0.2.2

Mar 25, 2026

0.2.1

Mar 25, 2026

0.1.2

Sep 4, 2025

0.1.1

Sep 4, 2025

0.1.0

Sep 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shared_tensor-0.2.5.tar.gz (25.5 kB view details)

Uploaded Mar 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

shared_tensor-0.2.5-py3-none-any.whl (29.8 kB view details)

Uploaded Mar 25, 2026 Python 3

File details

Details for the file shared_tensor-0.2.5.tar.gz.

File metadata

Download URL: shared_tensor-0.2.5.tar.gz
Upload date: Mar 25, 2026
Size: 25.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for shared_tensor-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`6330e38b7eb428b2c4f69afea44ce56fc0b986118b90645990fc203df3a018a4`
MD5	`656c0d0bdd7e2f615373a027ae477986`
BLAKE2b-256	`4a2c14f3600d519d8f32fcf51809cfd16320c258d11330ac92383f991fb25227`

See more details on using hashes here.

File details

Details for the file shared_tensor-0.2.5-py3-none-any.whl.

File metadata

Download URL: shared_tensor-0.2.5-py3-none-any.whl
Upload date: Mar 25, 2026
Size: 29.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for shared_tensor-0.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fac1373fb38c6889ab3f109e26baed0218f6001ee42e04a59e377d8ba1d1b8f0`
MD5	`233dc1a5b353b79d300d7a7fcbecfc73`
BLAKE2b-256	`c044a02a7826f47d7536e47428bcd21d7b729b4ff075f45c1db667a03cb46bfa`

See more details on using hashes here.

shared-tensor 0.2.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Shared Tensor

Scope

Install

Example: Same Code, Two Processes

Example: Reusable Model Registry

Example: Direct Tensor Path

Configuration

Payload Contract

Managed Objects

Runtime Introspection

Testing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes