Automated hardware detection and virtual environment setup for ML training
Project description
Model Setup
Automated hardware detection and virtual environment setup for ML training. Cross-platform support for Linux, Windows+WSL, NVIDIA Jetson, CUDA, ROCm, and CPU-only systems.
Features
- Hardware Auto-Detection: Automatically detects GPU type (Jetson/CUDA/ROCm/CPU) and version
- Dynamic Version Selection: Automatically matches detected CUDA/ROCm version to appropriate PyTorch wheel (no hardcoded versions)
- Cross-Platform: Linux native + Windows Subsystem for Linux (WSL) support
- WSL Detection: Automatically detects WSL environment and checks WSL 2 + GPU prerequisites
- Detailed Diagnostics: Logs exactly why GPU setup fails (missing drivers, toolkit, libraries)
- Multi-Backend Support: Install all working backends (
--allflag) with easy switching - Jetson Optimized: Full support for Jetson Orin with sm_87, cuSPARSELt
- No Sudo Required: All dependencies installed locally in venv
- Keras 3.x Ready: Supports torch, tensorflow, and jax backends
Quick Start
git clone https://github.com/Alex-Glebov/model-setup.git
cd model-setup
python test_venv_builder.py /path/to/venv --config /path/to/hardware_config.json
Example:
python test_venv_builder.py ~/model-core/venv --config ~/model-core/hardware_config.json
Install All Working Backends
python test_venv_builder.py ~/model-core/venv --all --config ~/model-core/hardware_config.json
This installs all working backends (e.g., torch CUDA + tensorflow CUDA) and generates keras_backend.py with commented lines for easy switching.
With Custom Log File
python test_venv_builder.py ~/model-core/venv --log-file ~/setup.log --config ~/model-core/hardware_config.json
Hardware Support
| Platform | Status | PyTorch | Notes |
|---|---|---|---|
| NVIDIA Jetson Orin | ✅ Supported | 2.5.0+ (NVIDIA wheel) | sm_87, cuSPARSELt auto-detected |
| NVIDIA CUDA | ✅ Supported | 2.x (PyPI) | Auto-detects CUDA 11.8, 12.1, 12.4, 12.8+ |
| AMD ROCm | ✅ Supported | 2.x (PyPI) | Auto-detects ROCm 5.6, 5.7, 6.0+ |
| CPU Only | ✅ Supported | 2.x (PyPI) | No GPU acceleration |
| Windows WSL | ✅ Supported | 2.x (PyPI) | WSL 2 required for GPU passthrough |
Multi-Backend Support
Install multiple backends and switch between them:
model-setup ~/venv --all
Generates model_core/keras_backend.py:
# Active backend: torch
os.environ["KERAS_BACKEND"] = "torch"
# Alternative backends (uncomment to switch):
# Backend: tensorflow (cuda)
# os.environ["KERAS_BACKEND"] = "tensorflow"
Why PyTorch over TensorFlow?
- Jetson: NVIDIA provides official PyTorch wheels for JetPack 6.x; TensorFlow GPU support is not officially stable
- Performance: Better sm_87 (Orin) compute capability support
- Installation: PyTorch wheels include all dependencies; TensorFlow requires manual dependency management
- Keras 3.x: PyTorch is the default Keras backend with best compatibility
Architecture
Keras 3.x Unified Frontend
Model-setup installs Keras 3.x as the unified frontend API with your choice of execution backend:
┌─────────────────────────────────────────┐
│ Your Model Code │
│ import keras # Same API always │
└─────────────────────────────────────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌──────────┐
│ torch │ │tensorflow│ │ jax │
│Backend │ │ Backend │ │ Backend │
└────────┘ └──────────┘ └──────────┘
│ │ │
└──────────────┼──────────────┘
▼
┌─────────────────────────────────────────┐
│ GPU Hardware (CUDA/ROCm) │
└─────────────────────────────────────────┘
Key Concept: Keras 3.x provides a single unified API. You write model code once using keras, and it executes on TensorFlow, PyTorch, or JAX without changes.
- Keras = The API (layers, models, training loops)
- TensorFlow/PyTorch/JAX = The execution engine
System Architecture
model-setup (this project)
├── Detects hardware (hardware_detector.py)
├── Installs dependencies (venv_builder.py)
├── Creates venv with Keras + backend(s)
└── Writes .hardware_config.json (reference for quick lookup)
↓
model-core (runtime)
├── Imports keras from keras_backend.py (backend selection)
├── Auto-detects GPU via gpu_config.py (runtime detection)
└── Runs training/inference
Note: While .hardware_config.json is available for quick reference, model-core performs its own runtime GPU detection via gpu_config.py to determine optimal training parameters (batch size, LSTM units, learning rate).
Installation Details
What Gets Installed
-
PyTorch - Hardware-specific version with auto-detection:
- Jetson: NVIDIA's wheel from
developer.download.nvidia.com(matches JetPack version) - CUDA:
torchfrom PyPI with detected CUDA version (cu118, cu121, cu124, cu128+) - ROCm:
torchfrom PyPI with detected ROCm version (rocm5.6, rocm5.7, rocm6.0+) - CPU:
torchfrom PyPI CPU-only
- Jetson: NVIDIA's wheel from
-
cuSPARSELt (Jetson only):
- Auto-detected and installed locally in
venv/lib/cuda/lib/ - No system-wide changes (no sudo)
- Auto-detected and installed locally in
-
NumPy:
- Version < 2.0 (pinned for PyTorch compatibility)
- Bundled with PyTorch wheel
-
Other Dependencies:
- Read from
model-core/requirements.txt - Skips packages already provided by PyTorch
- Read from
Version Detection
The builder automatically detects and matches versions:
# Detected CUDA 12.2 → Uses PyTorch cu122 wheel
# Detected ROCm 5.7 → Uses PyTorch rocm5.7 wheel
# Detected JetPack 6.0 → Uses NVIDIA JetPack 6.x wheel
If detection fails, falls back to latest stable (cu121 for CUDA, rocm5.7 for ROCm).
Using the Venv
Option 1: Source activate (recommended)
cd ~/model-core
source venv/bin/activate
python train.py
Option 2: Direct Python path
# For Jetson, LD_LIBRARY_PATH must include cuSPARSELt
LD_LIBRARY_PATH=~/model-core/venv/lib/cuda/lib \
~/model-core/venv/bin/python train.py
Verification
import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"Device: {torch.cuda.get_device_name(0)}")
Project Structure
model-setup/
├── src/model_setup/
│ ├── __init__.py
│ ├── hardware_detector.py # Hardware detection (GPU type, versions, WSL)
│ ├── venv_builder.py # Venv creation with dynamic version selection
│ ├── gpu_compatibility.py # GPU compatibility testing
│ ├── pip_version_checker.py # PyPI availability checking
│ └── version_requirements.py # Minimum version requirements
├── test_venv_builder.py # Self-bootstrapping CLI for dev
└── README.md # This file
WSL Support (Windows Subsystem for Linux)
model-setup supports WSL 2 for GPU passthrough on Windows.
Prerequisites
- WSL 2 (WSL 1 does not support GPU)
- GPU drivers installed on Windows host
- WSL GPU support (
/usr/lib/wsl/libmust exist)
Automatic Detection
The builder automatically detects WSL and checks prerequisites:
INFO - WSL environment detected
INFO - WSL 2 confirmed
WARNING - WSL GPU support not detected (/usr/lib/wsl/lib missing)
Manual WSL Check
# Verify WSL version
wsl.exe --version
# Should show: WSL version: 2.x.x
# Check GPU in WSL
ls /usr/lib/wsl/lib/ | grep -i cuda
# Should show: libcuda.so, libd3d12.so, etc.
Detailed Diagnostics
model-setup logs exactly why GPU setup fails. Check the log file for details:
CUDA Prerequisites Check
INFO - Detecting hardware on Linux x86_64
INFO - WSL environment detected
INFO - CUDA GPU detected
WARNING - CUDA prerequisites missing:
WARNING - - nvidia-smi not found - NVIDIA drivers not installed
WARNING - - CUDA toolkit not installed (nvcc not found)
WARNING - - cuDNN library (optional but recommended)
INFO - Will attempt install anyway, but may fail
ROCm Prerequisites Check
WARNING - ROCm prerequisites missing:
WARNING - - ROCm not installed (rocm-smi not found)
WARNING - - HIP toolkit not installed (hipcc not found)
Log File Locations
Using model-setup:
- Default:
{venv_parent}/test_venv_builder.log - Custom:
--log-file ~/custom.log
Using module directly:
{venv_path}/install.log
Configuration Priority
- Explicit (highest):
~/GPU/GPU_VARIANT.txtcontainingjetson,cuda, orrocm - Auto-detect: PyTorch CUDA availability check
- Platform inference:
aarch64→ likely Jetson
Troubleshooting
GPU Not Detected / Prerequisites Missing
Check the log file for exact reason:
cat ~/model-core/test_venv_builder.log | grep -E "(prerequisites|Missing|not found)"
Common issues:
- NVIDIA drivers: Install from NVIDIA drivers
- CUDA toolkit: Install from CUDA Toolkit
- ROCm: Install from AMD ROCm
WSL GPU Issues
# Verify WSL 2
wsl.exe --version
# Expected: WSL version: 2.x.x
# Check Windows GPU drivers
# Install from NVIDIA/AMD website on Windows host
# Verify WSL GPU passthrough
ls -la /usr/lib/wsl/lib/
# Should see: libcuda.so, libd3d12.so, libdxcore.so
"libcusparseLt.so.0: cannot open shared object file" (Jetson only)
- Venv not activated: Run
source venv/bin/activate - Or missing LD_LIBRARY_PATH when using direct Python path
"No module named 'torch'"
- Venv creation failed - check hardware detection output in log file
- Try running model-setup again with explicit path and log
- Check:
cat ~/model-core/test_venv_builder.log | grep -i error
NumPy version errors
- model-setup automatically pins
numpy<2 - If manually installing packages, avoid upgrading numpy
Installation Fails with "Unknown install type"
This was a bug in earlier versions. Update to latest:
git pull origin develop
Jetson-Specific Requirements
- JetPack: 6.x (R36.x in
/etc/nv_tegra_release) - Python: 3.10
- CUDA: 12.6 (from JetPack)
- Special: PyTorch wheels from NVIDIA's redist, not PyPI
See GPU/docs/Jetson Orin Nano — PyTorch & TensorFlow Limitations.md for detailed compatibility matrix.
Development
Contributing
See CONTRIBUTING.md for development setup and contribution guidelines.
Testing
# Test hardware detection
python -c "from model_setup.hardware_detector import HardwareDetector; \
h = HardwareDetector().detect(); print(h)"
# Test venv creation with logging
model-setup /tmp/test_venv --config /tmp/test_config.json --log-file /tmp/test.log
# Test with all backends
model-setup /tmp/test_venv --all --config /tmp/test_config.json
# View detailed logs
cat /tmp/test.log | grep -E "(prerequisites|Missing|detected)"
License
MIT License - See LICENSE file
Related Projects
- model-core - ML training runtime (uses this setup)
- pivots-api - Prediction API service
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file model_setup-0.2.2.tar.gz.
File metadata
- Download URL: model_setup-0.2.2.tar.gz
- Upload date:
- Size: 31.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
814815240dee9697eea8de70244fa6d26089a018621fb7cfb501f0ab0cae0c05
|
|
| MD5 |
f7eb10ec531c3c64dd9a826568bbfa17
|
|
| BLAKE2b-256 |
6492b63944512a1e30f88ce0c93055a26c48ea637d5e8de7329861a4931a4155
|
File details
Details for the file model_setup-0.2.2-py3-none-any.whl.
File metadata
- Download URL: model_setup-0.2.2-py3-none-any.whl
- Upload date:
- Size: 31.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e962c7be7e55d72c17008177617229415dc9772778a9afdefc5e7140078a17ef
|
|
| MD5 |
0711e919a54e797382aad002d3db989c
|
|
| BLAKE2b-256 |
a078fb72f91863f9c36de0b6e6a602f3b1f98fe4a7a694d8cc56a00442c3360e
|