Skip to main content

A library for sharing GPU memory objects across processes using IPC mechanisms

Project description

Shared Tensor

Python Version PyTorch License

A high-performance library for sharing GPU memory objects across processes using IPC mechanisms with JSON-RPC 2.0 protocol, enabling model and inference engine separation architecture.

๐Ÿš€ Project Overview

Shared Tensor is a cross-process communication library designed specifically for deep learning and AI applications, utilizing IPC mechanisms and JSON-RPC protocol to achieve:

  • Efficient GPU Memory Sharing: Cross-process sharing of PyTorch tensors and models
  • Remote Function Execution: Easy remote function calls through decorators
  • Async/Sync Support: Flexible execution modes for different scenarios
  • Model Serving: Deploy machine learning models as independent services
  • Distributed Inference: Support for distributed computing in multi-GPU environments

๐Ÿ“‹ Core Features

๐Ÿ”„ Cross-Process Communication

  • JSON-RPC 2.0 Protocol: Standardized remote procedure calls
  • HTTP Transport: Reliable HTTP-based communication mechanism
  • Serialization Optimization: Efficient PyTorch object serialization/deserialization

๐ŸŽฏ Function Sharing

  • Decorator Pattern: Easy function sharing using @provider.share
  • Auto Discovery: Smart function path resolution and import
  • Parameter Passing: Support for complex data type parameters

โšก Async Support

  • Async Execution: AsyncSharedTensorProvider supports non-blocking calls
  • Task Management: Complete async task status tracking
  • Concurrent Processing: Efficient concurrent request handling

๐Ÿ–ฅ๏ธ GPU Compatibility

  • CUDA Support: Native CUDA tensor sharing support
  • Device Management: Smart data migration between devices
  • Memory Optimization: Efficient GPU memory usage

๐Ÿ› ๏ธ Installation Guide

Requirements

  • Python: 3.8+
  • Operating System: Linux (recommended)
  • PyTorch: 1.12.0+
  • CUDA: Optional, for GPU support

Installation Methods

Install from Source

# Clone the repository
git clone https://github.com/world-sim-dev/shared-tensor.git
cd shared-tensor

# Install dependencies
pip install -r requirements.txt

# Install the package
pip install -e .

Development Installation

# Install with development dependencies
pip install -e ".[dev]"

# Install with test dependencies
pip install -e ".[test]"

Verify Installation

# Check core functionality
python -c "import shared_tensor; print('โœ“ Shared Tensor installed successfully')"

๐ŸŽฏ Quick Start

1. Basic Function Sharing

from shared_tensor.async_provider import AsyncSharedTensorProvider

# Create provider
provider = AsyncSharedTensorProvider()

# Share simple function
@provider.share()
def add_numbers(a, b):
    return a + b

# Share PyTorch function
@provider.share()
def create_tensor(shape):
    import torch
    return torch.zeros(shape)

# Load PyTorch model
@provider.share()
def load_model():
    ...

2. Start Server

# Method 1: Use command line tool, single server
shared-tensor-server

# Method 2: Use torchrun
torchrun --nproc_per_node=4 --no-python shared-tensor-server

# Method 3: Custom configuration
python shared_tensor/server.py

๐Ÿ“– Detailed Usage

Model Sharing Example

import torch
import torch.nn as nn

from shared_tensor.async_provider import AsyncSharedTensorProvider

# Create provider
provider = AsyncSharedTensorProvider()

# Define model
class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Share model creation function
@provider.share(name="create_model")
def create_model(input_size=784, hidden_size=128, output_size=10):
    model = SimpleNet(input_size, hidden_size, output_size)
    return model

# Share inference function
model = create_model()
with torch.no_grad():
    model(input_data)

๐Ÿ”ง Configuration Options

Server Configuration

from shared_tensor.server import SharedTensorServer

server = SharedTensorServer(
    host="0.0.0.0",           # Listen address
    port=2537,                # Port number
    timeout=30,               # Request timeout
    max_workers=4,            # Maximum worker threads
    enable_cache=True,        # Enable result caching
    debug=False               # Debug mode
)

๐Ÿงช Testing

Run Test Suite

# Run all tests
python tests/run_tests.py

# Run specific category tests
python tests/run_tests.py --category unit
python tests/run_tests.py --category integration
python tests/run_tests.py --category pytorch

# Run only PyTorch related tests
python tests/run_tests.py --torch-only

# Verbose output
python tests/run_tests.py --verbose

Test Environment Info

# Check test environment
python tests/run_tests.py --env-info

Individual Test Files

# Test tensor serialization
python tests/pytorch_tests/test_tensor_serialization.py

# Test async system
python tests/integration/test_async_system.py

# Test client
python tests/integration/test_client.py

๐Ÿ—๏ธ Architecture Design

Core Components

shared-tensor/
โ”œโ”€โ”€ shared_tensor/              # Core modules
โ”‚   โ”œโ”€โ”€ server.py              # JSON-RPC server
โ”‚   โ”œโ”€โ”€ client.py              # Sync client
โ”‚   โ”œโ”€โ”€ provider.py            # Sync provider
โ”‚   โ”œโ”€โ”€ async_client.py        # Async client
โ”‚   โ”œโ”€โ”€ async_provider.py      # Async provider
โ”‚   โ”œโ”€โ”€ async_task.py          # Async task management
โ”‚   โ”œโ”€โ”€ jsonrpc.py            # JSON-RPC protocol implementation
โ”‚   โ”œโ”€โ”€ utils.py              # Utility functions
โ”‚   โ””โ”€โ”€ errors.py             # Exception definitions
โ”œโ”€โ”€ examples/                  # Usage examples
โ””โ”€โ”€ tests/                     # Test suite

Communication Flow

sequenceDiagram
    participant CA as Client App
    participant SC as SharedTensorClient
    participant SS as SharedTensorServer
    participant FE as Function Executor
    
    Note over CA, FE: Client-Server Communication Flow
    
    CA->>SC: call_function("model_inference", args)
    SC->>SC: Serialize parameters
    SC->>SS: HTTP POST /jsonrpc<br/>JSON-RPC Request
    
    Note over SS: Server Processing
    SS->>SS: Parse JSON-RPC request
    SS->>SS: Resolve function path
    SS->>FE: Import & execute function
    FE->>FE: Deserialize parameters
    FE->>FE: Execute function logic
    FE->>SS: Return execution result
    
    Note over SS: Response Preparation
    SS->>SS: Serialize result
    SS->>SS: Create JSON-RPC response
    SS->>SC: HTTP Response<br/>JSON-RPC Result
    
    Note over SC: Client Processing
    SC->>SC: Parse response
    SC->>SC: Deserialize result
    SC->>CA: Return final result
    
    Note over CA, FE: End-to-End Process Complete

Debug Tips

  1. Enable verbose logging:
import logging
logging.basicConfig(level=logging.DEBUG)
  1. Use debug mode:
provider = SharedTensorProvider(verbose_debug=True)
  1. Check function paths:
provider = SharedTensorProvider()
print(provider._registered_functions)

๐Ÿค Contributing

We welcome community contributions! Please follow these steps:

Development Environment Setup

# Clone repository
git clone https://github.com/world-sim-dev/shared-tensor.git
cd shared-tensor

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Package & Publish
python setup.py sdist bdist_wheel
python -m twine upload --repository testpypi dist/*
python -m twine upload dist/*

Code Standards

# Code formatting
black shared_tensor/ tests/ examples/

# Import sorting
isort shared_tensor/ tests/ examples/

# Static checking
flake8 shared_tensor/
mypy shared_tensor/

Submission Process

  1. Fork the project and create a feature branch
  2. Write code and tests
  3. Run the complete test suite
  4. Submit a Pull Request

Test Requirements

  • New features must include tests
  • Maintain test coverage > 90%
  • All tests must pass

๐Ÿ“„ License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details

๐Ÿ™ Acknowledgments

๐Ÿ“ž Contact Us


Shared Tensor - Making GPU memory sharing simple and efficient ๐Ÿš€

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shared_tensor-0.1.0.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shared_tensor-0.1.0-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file shared_tensor-0.1.0.tar.gz.

File metadata

  • Download URL: shared_tensor-0.1.0.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for shared_tensor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9432eda8a08b9084c8a0346d9ac07fd7e8d754187e73d834db611beba038b955
MD5 765c9d4ce3e3b68b2f27b42c42d6f14e
BLAKE2b-256 a001022dcd2ac5b048ac83ddf08c481c45b6c91963ece901145b94ddd96f488f

See more details on using hashes here.

File details

Details for the file shared_tensor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: shared_tensor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for shared_tensor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 93bdaec4d97e4126c4710b3e9f3dbedd35b11f8ddeca885c7db7a1a5de743c9b
MD5 ab307f780c49b443d2a6c4dc22dcddbc
BLAKE2b-256 a6171eb902a50b3f872c35646a8f6cdd9c66620bf41b78e76eb659ad2dd797ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page