Distributed cache system for PyTorch training
Project description
RAMJET — Distributed Data Cache for PyTorch Training
RAMJET accelerates PyTorch distributed training by caching preprocessed data across your cluster. Works with any DDP setup — torchrun, DeepSpeed, Accelerate, or custom launchers.
Why RAMJET?
| Problem | Solution |
|---|---|
| Slow data preprocessing | Cache preprocessed samples across nodes |
| Network bottleneck from shared storage | Local SSD cache on each node |
| Repeated data loading across epochs | First epoch caches, next epochs are instant |
| No visibility into training | Real-time metrics dashboard |
Quick Start
1. Install
pip install ramjetio
2. Add to Your Training Script
import ramjetio
from torch.utils.data import DataLoader
ramjetio.init()
dataset = ramjetio.CachedDataset(your_dataset)
loader = DataLoader(dataset, batch_size=32)
for batch in loader:
train_step(batch)
3. Run
Get your API key from app.ramjet.io (create a cluster → copy key).
export RAMJET_API_KEY="your_api_key_here"
python train.py
Multi-GPU: torchrun --nproc_per_node=N train.py
That's it! Your nodes will appear in the dashboard within seconds.
How It Works
┌─────────────────────────────────────────────────────────────┐
│ Your Training Cluster │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Node 0 │ │ Node 1 │ │ Node 2 │ │
│ │ ┌───────┐ │ │ ┌───────┐ │ │ ┌───────┐ │ │
│ │ │ Train │ │ │ │ Train │ │ │ │ Train │ │ │
│ │ └───┬───┘ │ │ └───┬───┘ │ │ └───┬───┘ │ │
│ │ │ │ │ │ │ │ │ │ │
│ │ ┌───▼───┐ │ │ ┌───▼───┐ │ │ ┌───▼───┐ │ │
│ │ │ RAMJET │◄─┼────┼──┤ RAMJET │◄─┼────┼──┤ RAMJET │ │ │
│ │ │ Cache │──┼────┼──► Cache │──┼────┼──► Cache │ │ │
│ │ └───────┘ │ │ └───────┘ │ │ └───────┘ │ │
│ │ 500GB SSD │ │ 500GB SSD │ │ 500GB SSD │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ RAMJET Dashboard │ │
│ │ (Metrics UI) │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Features
- 🚀 Zero-config caching —
ramjetio.init()handles everything - 📊 Real-time dashboard — monitor cache hits, throughput, GPU utilization
- 🔄 Consistent hashing — data distributed evenly across nodes
- 💾 Disk-backed cache — survives restarts, uses NVMe SSDs efficiently
- 🔌 Works with any setup — torchrun, DeepSpeed, Accelerate, custom launchers
- ☁️ S3/MinIO integration — configure data source in dashboard, not in code
Integration Examples
See docs/INTEGRATION.md for detailed examples with:
- PyTorch DDP with
torchrun - DeepSpeed
- HuggingFace Accelerate
- Custom training loops
- Multi-node clusters
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
RAMJET_API_KEY |
Your API key (required) | — |
RAMJET_CACHE_PATH |
Local cache directory | /tmp/ramjet_cache |
RAMJET_CACHE_SIZE |
Max cache size | 100GB |
RAMJET_PORT |
Cache server port | 9000 |
Dashboard Settings
Configure in the web dashboard (no code changes needed):
- Data Source: S3/MinIO endpoint, bucket, credentials
- Cache Settings: TTL, replication factor, eviction policy
Distributed Training (DDP)
RAMJET automatically detects torchrun and DDP environments:
Single Machine, Multiple GPUs (torchrun)
# 4 GPUs on one machine
torchrun --nproc_per_node=4 train.py
import ramjetio
import torch.distributed as dist
# Only LOCAL_RANK=0 starts cache server - others wait and share it
ramjetio.init()
# All ranks use the same cache
dataset = ramjetio.CachedDataset(your_dataset)
Multi-Node Training
RAMJET auto-detects your cluster manager — no manual configuration needed:
| Environment | How to launch | RAMJET detects it? |
|---|---|---|
| SLURM | srun python train.py |
✅ Automatic |
| Kubernetes (PyTorchJob) | Managed by operator | ✅ Automatic |
| DeepSpeed | deepspeed --hostfile hosts train.py |
✅ Automatic |
| Accelerate | accelerate launch train.py |
✅ Automatic |
| torchrun | torchrun --nproc_per_node=N train.py |
✅ Automatic |
| SageMaker | Configured in SageMaker console | ✅ Automatic |
Each node runs one cache server (on LOCAL_RANK=0), and all nodes share data via consistent hashing.
RAMJET reads LOCAL_RANK, RANK, WORLD_SIZE from environment — every major launcher sets these automatically.
CLI Tools
# Start cache server manually (usually not needed — ramjetio.init() does this)
ramjetio-server --port 9000 --capacity 100GB
# Check cache status
ramjetio-client stats
# Clear cache
ramjetio-client clear
Requirements
- Python 3.8+
- PyTorch 1.9+
- Linux (recommended for production)
- SSD storage for cache (recommended)
Documentation
- Integration Guide — detailed examples for all frameworks
- API Reference — full API documentation
- Troubleshooting — common issues and solutions
License
PolyForm Noncommercial License 1.0.0 — free for personal and non-commercial use. For commercial licensing, contact licensing@ramjet.dev. See LICENSE for details.
Support
- 📧 Email: support@ramjet.io
- 💬 Discord: discord.gg/ramjet
- 📖 Docs: docs.ramjet.io
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ramjetio-0.8.2-py3-none-any.whl.
File metadata
- Download URL: ramjetio-0.8.2-py3-none-any.whl
- Upload date:
- Size: 94.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54b1d76973df7f290b12a9e843f7281107b6daf12ba225a004ecfc0c425348dc
|
|
| MD5 |
3de46cf2a3f9c1964b19059609c68c9c
|
|
| BLAKE2b-256 |
037c8fa4cb153a7c0f8ecb1cc95b905c47475e6c458fd95b7e7bc6aa9fbb86c9
|