Skip to main content

A library for sharing GPU memory objects across processes using IPC mechanisms

Project description

Shared Tensor

Python Version PyTorch License

A high-performance library for sharing GPU memory objects across processes using IPC mechanisms with JSON-RPC 2.0 protocol, enabling model and inference engine separation architecture.

๐Ÿš€ Project Overview

Shared Tensor is a cross-process communication library designed specifically for deep learning and AI applications, utilizing IPC mechanisms and JSON-RPC protocol to achieve:

  • Efficient GPU Memory Sharing: Cross-process sharing of PyTorch tensors and models
  • Remote Function Execution: Easy remote function calls through decorators
  • Async/Sync Support: Flexible execution modes for different scenarios
  • Model Serving: Deploy machine learning models as independent services
  • Distributed Inference: Support for distributed computing in multi-GPU environments

๐Ÿ“‹ Core Features

๐Ÿ”„ Cross-Process Communication

  • JSON-RPC 2.0 Protocol: Standardized remote procedure calls
  • HTTP Transport: Reliable HTTP-based communication mechanism
  • Serialization Optimization: Efficient PyTorch object serialization/deserialization

๐ŸŽฏ Function Sharing

  • Decorator Pattern: Easy function sharing using @provider.share
  • Auto Discovery: Smart function path resolution and import
  • Parameter Passing: Support for complex data type parameters

โšก Async Support

  • Async Execution: AsyncSharedTensorProvider supports non-blocking calls
  • Task Management: Complete async task status tracking
  • Concurrent Processing: Efficient concurrent request handling

๐Ÿ–ฅ๏ธ GPU Compatibility

  • CUDA Support: Native CUDA tensor sharing support
  • Device Management: Smart data migration between devices
  • Memory Optimization: Efficient GPU memory usage

๐Ÿ› ๏ธ Installation Guide

Requirements

  • Python: 3.8+
  • Operating System: Linux (recommended)
  • PyTorch: 1.12.0+
  • CUDA: Optional, for GPU support

Installation Methods

Install from Pypi

pip install shared-tensor

Install from Source

# Clone the repository
git clone https://github.com/world-sim-dev/shared-tensor.git
cd shared-tensor

# Install dependencies
pip install -r requirements.txt

# Install the package
pip install -e .

Development Installation

# Install with development dependencies
pip install -e ".[dev]"

# Install with test dependencies
pip install -e ".[test]"

Verify Installation

# Check core functionality
python -c "import shared_tensor; print('โœ“ Shared Tensor installed successfully')"

๐ŸŽฏ Quick Start

1. Basic Function Sharing

from shared_tensor.async_provider import AsyncSharedTensorProvider

# Create provider
provider = AsyncSharedTensorProvider()

# Share simple function
@provider.share()
def add_numbers(a, b):
    return a + b

# Share PyTorch function
@provider.share()
def create_tensor(shape):
    import torch
    return torch.zeros(shape)

# Load PyTorch model
@provider.share()
def load_model():
    ...

2. Start Server

# Method 1: Use command line tool, single server
shared-tensor-server

# Method 2: Use torchrun
torchrun --nproc_per_node=4 --no-python shared-tensor-server

# Method 3: Custom configuration
python shared_tensor/server.py

๐Ÿ“– Detailed Usage

Model Sharing Example

import torch
import torch.nn as nn

from shared_tensor.async_provider import AsyncSharedTensorProvider

# Create provider
provider = AsyncSharedTensorProvider()

# Define model
class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Share model creation function
@provider.share(name="create_model")
def create_model(input_size=784, hidden_size=128, output_size=10):
    model = SimpleNet(input_size, hidden_size, output_size)
    return model

# Share inference function
model = create_model()
with torch.no_grad():
    model(input_data)

๐Ÿ”ง Configuration Options

Server Configuration

from shared_tensor.server import SharedTensorServer

server = SharedTensorServer(
    host="0.0.0.0",           # Listen address
    port=2537,                # Port number
    timeout=30,               # Request timeout
    max_workers=4,            # Maximum worker threads
    enable_cache=True,        # Enable result caching
    debug=False               # Debug mode
)

๐Ÿงช Testing

Run Test Suite

# Run all tests
python tests/run_tests.py

# Run specific category tests
python tests/run_tests.py --category unit
python tests/run_tests.py --category integration
python tests/run_tests.py --category pytorch

# Run only PyTorch related tests
python tests/run_tests.py --torch-only

# Verbose output
python tests/run_tests.py --verbose

Test Environment Info

# Check test environment
python tests/run_tests.py --env-info

Individual Test Files

# Test tensor serialization
python tests/pytorch_tests/test_tensor_serialization.py

# Test async system
python tests/integration/test_async_system.py

# Test client
python tests/integration/test_client.py

๐Ÿ—๏ธ Architecture Design

Core Components

shared-tensor/
โ”œโ”€โ”€ shared_tensor/              # Core modules
โ”‚   โ”œโ”€โ”€ server.py              # JSON-RPC server
โ”‚   โ”œโ”€โ”€ client.py              # Sync client
โ”‚   โ”œโ”€โ”€ provider.py            # Sync provider
โ”‚   โ”œโ”€โ”€ async_client.py        # Async client
โ”‚   โ”œโ”€โ”€ async_provider.py      # Async provider
โ”‚   โ”œโ”€โ”€ async_task.py          # Async task management
โ”‚   โ”œโ”€โ”€ jsonrpc.py            # JSON-RPC protocol implementation
โ”‚   โ”œโ”€โ”€ utils.py              # Utility functions
โ”‚   โ””โ”€โ”€ errors.py             # Exception definitions
โ”œโ”€โ”€ examples/                  # Usage examples
โ””โ”€โ”€ tests/                     # Test suite

Communication Flow

sequenceDiagram
    participant CA as Client App
    participant SC as SharedTensorClient
    participant SS as SharedTensorServer
    participant FE as Function Executor
    
    Note over CA, FE: Client-Server Communication Flow
    
    CA->>SC: call_function("model_inference", args)
    SC->>SC: Serialize parameters
    SC->>SS: HTTP POST /jsonrpc<br/>JSON-RPC Request
    
    Note over SS: Server Processing
    SS->>SS: Parse JSON-RPC request
    SS->>SS: Resolve function path
    SS->>FE: Import & execute function
    FE->>FE: Deserialize parameters
    FE->>FE: Execute function logic
    FE->>SS: Return execution result
    
    Note over SS: Response Preparation
    SS->>SS: Serialize result
    SS->>SS: Create JSON-RPC response
    SS->>SC: HTTP Response<br/>JSON-RPC Result
    
    Note over SC: Client Processing
    SC->>SC: Parse response
    SC->>SC: Deserialize result
    SC->>CA: Return final result
    
    Note over CA, FE: End-to-End Process Complete

Debug Tips

  1. Enable verbose logging:
import logging
logging.basicConfig(level=logging.DEBUG)
  1. Use debug mode:
provider = SharedTensorProvider(verbose_debug=True)
  1. Check function paths:
provider = SharedTensorProvider()
print(provider._registered_functions)

๐Ÿค Contributing

We welcome community contributions! Please follow these steps:

Development Environment Setup

# Clone repository
git clone https://github.com/world-sim-dev/shared-tensor.git
cd shared-tensor

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Package & Publish
python -m pip install build
python -m build --sdist
python -m twine upload --repository testpypi dist/*
python -m twine upload dist/*

Code Standards

# Code formatting
black shared_tensor/ tests/ examples/

# Import sorting
isort shared_tensor/ tests/ examples/

# Static checking
flake8 shared_tensor/
mypy shared_tensor/

Submission Process

  1. Fork the project and create a feature branch
  2. Write code and tests
  3. Run the complete test suite
  4. Submit a Pull Request

Test Requirements

  • New features must include tests
  • Maintain test coverage > 90%
  • All tests must pass

๐Ÿ“„ License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details

๐Ÿ™ Acknowledgments

๐Ÿ“ž Contact Us


Shared Tensor - Making GPU memory sharing simple and efficient ๐Ÿš€

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shared_tensor-0.1.1.tar.gz (25.7 kB view details)

Uploaded Source

File details

Details for the file shared_tensor-0.1.1.tar.gz.

File metadata

  • Download URL: shared_tensor-0.1.1.tar.gz
  • Upload date:
  • Size: 25.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for shared_tensor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a3abcff0da8a2de3b9db2e4d8d0c613d9cdf5b481ef450534cc2b554b5b1500f
MD5 43df1be50b1009e8868f2fc67237e51c
BLAKE2b-256 859ace9ab653b5e06edd9333f9f9316b1545a87486f220d331ab6846221fee8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page