Remote GPU — pure Python client, zero dependencies, auto CPU fallback

These details have not been verified by PyPI

Project links

Project description

RemoteCUDA

Remote GPU. Zero Code Changes.

GPU راه دور. بدون تغییر حتی یک خط کد.

Table of Contents | فهرست مطالب

#	Section
1	Distinctive Capabilities
2	Comparative Advantage
3	Architecture at a Glance
4	Usage — Step by Step
5	Performance Benchmarks
6	قابلیت‌های متمایز
7	مزیت رقابتی
8	نحوه استفاده — گام به گام

1. Distinctive Capabilities

RemoteCUDA is not merely another remote GPU tool — it is a complete distributed execution framework that operates transparently beneath your existing PyTorch codebase.

Capability Matrix

Capability	Description	Impact
Zero-Change Integration	No import rewrites, no decorators, no context managers required. Your `model.cuda()` calls work exactly as written.	Eliminates migration cost entirely.
Network Auto-Discovery	UDP multicast-based server detection. Clients locate GPU servers without IP addresses, configuration files, or environment variables.	Zero-configuration deployment.
Multi-GPU Scheduling	Four distinct scheduling strategies — Adaptive, Load-Balancing, Memory-Aware, Latency-Optimized — with automatic strategy switching based on workload profiling.	Optimal resource utilization without manual tuning.
Local Dataset Retention	Datasets remain on the client filesystem. Only the active mini-batch traverses the network. A 512MB LRU/LFU tensor cache eliminates redundant transfers.	50GB dataset → zero upfront transfer.
Asynchronous Stream Pipelining	Data transfer and GPU computation overlap via CUDA stream semantics. While batch N computes on the GPU, batch N+1 is already in flight over the network.	Hides network latency behind computation.
Automatic Failover	GPU health monitored via heartbeat protocol. Failed tasks automatically reassigned to healthy GPUs with configurable retry logic (default: 3 attempts).	Graceful degradation under hardware failure.
Heterogeneous GPU Support	Mixed GPU architectures (RTX 4090 + A100 + RTX 3090) within a single pool. Scheduler accounts for compute capability, memory capacity, and current utilization.	Leverage all available hardware simultaneously.
No CUDA on Client	Client requires only PyTorch CPU build. No NVIDIA drivers, no CUDA toolkit, no GPU hardware.	Works on any laptop, including ultrabooks.

Scheduling Strategies

┌─────────────────────────────────────────────────────────────────────────────┐ │ GPU Scheduler — Strategy Selection │ ├───────────────────┬─────────────────────────────────────────────────────────┤ │ Strategy │ Behavior │ ├───────────────────┼─────────────────────────────────────────────────────────┤ │ Adaptive │ Profiles each task; routes memory-heavy ops to GPUs │ │ (default) │ with most free VRAM, latency-sensitive ops to GPUs │ │ │ with lowest historical response time. │ ├───────────────────┼─────────────────────────────────────────────────────────┤ │ Load-Balancing │ Distributes tasks evenly across all available GPUs. │ │ │ Optimal for homogeneous clusters running uniform │ │ │ workloads (e.g., hyperparameter sweeps). │ ├───────────────────┼─────────────────────────────────────────────────────────┤ │ Memory-Aware │ Routes each task to the GPU with the most free memory. │ │ │ Essential for large models (LLMs, diffusion models) │ │ │ where VRAM is the primary constraint. │ ├───────────────────┼─────────────────────────────────────────────────────────┤ │ Latency-Optimized │ Routes to GPUs with lowest historical average │ │ │ forward-pass time. Ideal for real-time inference │ │ │ serving where tail latency matters. │ └───────────────────┴─────────────────────────────────────────────────────────┘

2. Comparative Advantage

RemoteCUDA occupies a unique position in the design space. Unlike SSH-based approaches (which require code relocation) and unlike simpler CUDA-forwarding tools (which lack intelligence), RemoteCUDA provides transparent interception combined with sophisticated resource management.

Feature-by-Feature Comparison

Feature	RemoteCUDA	SSH + SCP	VSCode Remote	Chidori GPU	Tensorlink
Zero code changes	Yes	No	No	Yes	Yes
Local dataset access	Yes	No	No	Yes	Yes
Network auto-discovery	Yes	No	No	No	No
Single-command server setup	Yes	No	No	Yes	No
Multi-GPU parallelism	Yes	No	No	No	No
Intelligent load balancing	Yes	No	No	No	No
Automatic failover	Yes	No	No	No	No
Multiple scheduling strategies	Yes (4)	No	No	No	No
Tensor caching layer	Yes	No	No	No	No
Async stream pipelining	Yes	No	No	No	No
Heterogeneous GPU support	Yes	No	No	No	No
No CUDA on client	Yes	Yes	No	Yes	Yes
pip installable	Yes	N/A	N/A	Yes	No
Windows support	Yes	No	No	Yes	Yes
Open source (MIT)	Yes	N/A	No	Yes	Yes

Positioning

Intelligence / Automation ▲ │ │ ┌──────────────┐ │ │ RemoteCUDA │ ← Transparent + Smart │ └──────────────┘ │ │ ┌──────────────┐ │ │ Chidori GPU │ ← Transparent only │ └──────────────┘ │ │ ┌──────────────┐ │ │ VSCode │ ← Manual + Remote │ │ Remote │ │ └──────────────┘ │ │ ┌──────────────┐ │ │ SSH + SCP │ ← Manual + Relocation │ └──────────────┘ │ └──────────────────────────────►

RemoteCUDA is the only solution that combines zero code impact (transparent CUDA interception) with intelligent resource orchestration (scheduling, caching, failover, pipelining).

3. Architecture at a Glance

┌──────────────────────────────────────────────────────────────────────────────┐ │ CLIENT MACHINE (No GPU) │ │ │ │ Your Code RemoteCUDA Stack │ │ ┌──────────┐ ┌─────────────────────────────────────────────────────┐ │ │ │model.cuda()──→│ AutoCUDAHook → GPUScheduler → GPUPool │ │ │ │tensor.to() │ │ (intercept) (strategy) (connections) │ │ │ │loss.back() │ └─────────────────────────────────────────────────────┘ │ │ └──────────┘ ┌─────────────────────────────────────────────────────┐ │ │ │ TensorCache │ StreamManager │ TensorBridge │ │ │ │ (512MB LRU) │ (async pipes) │ (lifecycle) │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ └────────────────────────────────────┬──────────────────────────────────────────┘ │ │ TCP/IP (1–10 Gbps Ethernet) │ Protocol: Binary, zlib/lz4 compressed │ Discovery: UDP Multicast (239.255.100.100) │ ┌──────────────────────────┼──────────────────────────┐ │ │ │ ▼ ▼ ▼ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ GPU Server 1 │ │ GPU Server 2 │ │ GPU Server N │ │ │ │ │ │ │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ │ GPUWorker 0│ │ │ │ GPUWorker 0│ │ │ │ GPUWorker 0│ │ │ │ (RTX 4090) │ │ │ │ (RTX 3090) │ │ │ │ (A100) │ │ │ ├────────────┤ │ │ ├────────────┤ │ │ ├────────────┤ │ │ │ GPUWorker 1│ │ │ │ GPUWorker 1│ │ │ │ GPUWorker 1│ │ │ │ (RTX 4090) │ │ │ │ (RTX 3090) │ │ │ │ (A100) │ │ │ └────────────┘ │ │ └────────────┘ │ │ └────────────┘ │ │ │ │ │ │ │ │ Discovery: ON │ │ Discovery: ON │ │ Discovery: ON │ └──────────────────┘ └──────────────────┘ └──────────────────┘

Key Design Decisions:

One Worker per GPU: Each physical GPU gets its own worker thread, memory manager, and CUDA stream. No cross-GPU contention.
Binary Protocol: Custom 16-byte header with CRC32 integrity check. Payload compressed with zlib (level 3) by default; supports lz4 and zstd as optional accelerators.
UDP Discovery: Servers announce presence via multicast; clients listen passively. No central registry. Servers can join or leave dynamically.

4. Usage — Step by Step

4.1 Server Setup (GPU Machine)

Prerequisites: Python 3.8+, PyTorch with CUDA, NVIDIA drivers.

# Install
pip install remotecuda

# Start the service
remotecuda start



 RemoteCUDA GPU Service
 GPU:        NVIDIA GeForce RTX 4090 (24.0 GB)
 Port:       55555
 Status:     Ready — Broadcasting on network
 Discovery:  ON (UDP multicast)

 GPU is now available on the network.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

5.0.0

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

remotecuda-5.0.0.tar.gz (96.4 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

remotecuda-5.0.0-py3-none-any.whl (96.5 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file remotecuda-5.0.0.tar.gz.

File metadata

Download URL: remotecuda-5.0.0.tar.gz
Upload date: Apr 30, 2026
Size: 96.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for remotecuda-5.0.0.tar.gz
Algorithm	Hash digest
SHA256	`e626b7e3c20027f0b24121ceb6c6a2ed462f69cd8bae39f2cc01c00f6879374e`
MD5	`10c8a9453c4ffa13772cbc4bf4b875f6`
BLAKE2b-256	`9c2b9549d44c8e9c9511ffed9a1ee455da98ef1996921b2a59aff60c44ad134d`

See more details on using hashes here.

File details

Details for the file remotecuda-5.0.0-py3-none-any.whl.

File metadata

Download URL: remotecuda-5.0.0-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 96.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for remotecuda-5.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f5f573a0f8b04ffc9ef28885885611ca846063778f2e3ad01a5461a0de8e708c`
MD5	`8252d92f2a0d77829f894caee6df4043`
BLAKE2b-256	`0f12d25ca21c7deda1a6830bbb8eb8fef102407730c75729394fc63c050663f3`

See more details on using hashes here.

remotecuda 5.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RemoteCUDA

Remote GPU. Zero Code Changes.

GPU راه دور. بدون تغییر حتی یک خط کد.

Table of Contents | فهرست مطالب

1. Distinctive Capabilities

Capability Matrix

Scheduling Strategies

2. Comparative Advantage

Feature-by-Feature Comparison

Positioning

3. Architecture at a Glance

4. Usage — Step by Step

4.1 Server Setup (GPU Machine)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes