Docker-like GPU runtime for ML inference with universal GPU compatibility
Project description
GPUX - Docker-like GPU Runtime for ML Inference
GPUX provides universal GPU compatibility for ML inference workloads, allowing you to run the same model on any GPU without compatibility issues. Think Docker, but for ML models.
🚀 Features
- Universal GPU Support: Works on NVIDIA, AMD, Apple Silicon, and Intel GPUs
- Docker-like UX: Familiar commands and configuration files
- Zero Configuration: Automatically selects the best execution provider
- High Performance: Optimized ONNX Runtime backends
- Easy Deployment: Simple HTTP server for production use
- Cross-platform: Works on Windows, macOS, and Linux
📦 Installation
# Install with pip
pip install gpux
# Or install with uv (recommended)
uv add gpux
🎯 Quick Start
1. Create a gpux.yml
# gpux.yml
name: sentiment-analysis
version: 1.0.0
model:
source: ./model.onnx
format: onnx
inputs:
text:
type: string
max_length: 512
required: true
outputs:
sentiment:
type: float32
shape: [1, 2]
labels: [negative, positive]
runtime:
gpu:
memory: 2GB
backend: auto
2. Build your model
gpux build .
3. Run inference
# Single inference
gpux run sentiment-analysis --input '{"text": "I love this product!"}'
# From file
gpux run sentiment-analysis --file input.json
# Benchmark
gpux run sentiment-analysis --benchmark --runs 1000
4. Start a server
gpux serve sentiment-analysis --port 8080
🛠️ Commands
Build
gpux build [PATH] # Build model from directory
gpux build . --provider cuda # Build with specific GPU provider
gpux build . --no-optimize # Build without optimization
Run
gpux run MODEL_NAME # Run inference
gpux run MODEL_NAME --input DATA # Run with input data
gpux run MODEL_NAME --file FILE # Run with input file
gpux run MODEL_NAME --benchmark # Run benchmark
Serve
gpux serve MODEL_NAME # Start HTTP server
gpux serve MODEL_NAME --port 9000 # Custom port
gpux serve MODEL_NAME --workers 4 # Multiple workers
Inspect
gpux inspect MODEL_NAME # Inspect model
gpux inspect --model model.onnx # Inspect model file directly
gpux inspect --json # JSON output
🔧 Configuration
gpux.yml Format
name: model-name
version: 1.0.0
description: "Model description"
model:
source: ./model.onnx
format: onnx
inputs:
input_name:
type: float32
shape: [1, 10]
required: true
description: "Input description"
outputs:
output_name:
type: float32
shape: [1, 2]
labels: [class1, class2]
description: "Output description"
runtime:
gpu:
memory: 2GB
backend: auto # auto, cuda, coreml, rocm, vulkan, metal, dx12
batch_size: 1
timeout: 30
serving:
port: 8080
host: 0.0.0.0
batch_size: 1
timeout: 5
preprocessing:
tokenizer: bert-base-uncased
max_length: 512
resize: [224, 224]
normalize: imagenet
🎯 Supported Platforms
| Platform | GPU | Provider | Status |
|---|---|---|---|
| NVIDIA | CUDA | CUDAExecutionProvider | ✅ |
| NVIDIA | TensorRT | TensorrtExecutionProvider | ✅ |
| AMD | ROCm | ROCmExecutionProvider | ✅ |
| Apple | Metal | CoreMLExecutionProvider | ✅ |
| Intel | OpenVINO | OpenVINOExecutionProvider | ✅ |
| Windows | DirectML | DirectMLExecutionProvider | ✅ |
| Universal | CPU | CPUExecutionProvider | ✅ |
🚀 Performance
GPUX automatically selects the best execution provider for your hardware:
- Apple Silicon: CoreML (optimized for M1/M2/M3)
- NVIDIA: TensorRT > CUDA (best performance)
- AMD: ROCm (ROCm acceleration)
- Intel: OpenVINO (Intel optimization)
- Windows: DirectML (Windows GPU acceleration)
- Fallback: CPU (universal compatibility)
📚 Examples
Check out the examples/ directory for complete examples:
- Sentiment Analysis - BERT-based text classification
- Image Classification - ResNet-50 for ImageNet
🔌 API Reference
Python API
from gpux import GPUXRuntime
# Initialize runtime
runtime = GPUXRuntime(model_path="model.onnx")
# Run inference
results = runtime.infer({"input": data})
# Benchmark
metrics = runtime.benchmark(data, num_runs=100)
# Get model info
info = runtime.get_model_info()
HTTP API
When serving a model, GPUX provides a REST API:
# Health check
GET /health
# Model information
GET /info
# Run inference
POST /predict
Content-Type: application/json
{
"input": "your data here"
}
🧪 Testing
# Run tests
pytest
# Run with coverage
pytest --cov=src/gpux
# Run specific test
pytest tests/test_runtime.py
🤝 Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- ONNX Runtime for the excellent ML runtime
- FastAPI for the web framework
- Typer for the CLI framework
- Rich for beautiful terminal output
📞 Support
- 📖 Documentation
- 🐛 Issues
- 💬 Discussions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gpux-0.1.1.tar.gz.
File metadata
- Download URL: gpux-0.1.1.tar.gz
- Upload date:
- Size: 244.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
181f0b4049a19fffb67ef49f1236aa3a8c4234cc18f906a356182f4d381296dd
|
|
| MD5 |
2401a18c01b2fa47f33920fd3dd66ce4
|
|
| BLAKE2b-256 |
4383c16579629df39936e5cc6b01e70d421732ca8da56a26326336aee8051afd
|
File details
Details for the file gpux-0.1.1-py3-none-any.whl.
File metadata
- Download URL: gpux-0.1.1-py3-none-any.whl
- Upload date:
- Size: 30.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac3abb6a8bd4b8deffb370e025899abf069c66adf648f56cef8461b6aa7f4e14
|
|
| MD5 |
4071c8ae0de4b9d9989e6c2d1ad9c691
|
|
| BLAKE2b-256 |
f8e789d5f8e14f1dd48897f47fb08b71166cd136737b85e7db7164d578b0ee9a
|