Skip to main content

Automated hardware detection and virtual environment setup for ML training

Project description

Model Setup

Automated hardware detection and virtual environment setup for ML training. Cross-platform support for Linux, Windows+WSL, NVIDIA Jetson, CUDA, ROCm, and CPU-only systems.

Python Platform GPU

Features

  • Hardware Auto-Detection: Automatically detects GPU type (Jetson/CUDA/ROCm/CPU) and version
  • Dynamic Version Selection: Automatically matches detected CUDA/ROCm version to appropriate PyTorch wheel (no hardcoded versions)
  • Cross-Platform: Linux native + Windows Subsystem for Linux (WSL) support
  • WSL Detection: Automatically detects WSL environment and checks WSL 2 + GPU prerequisites
  • Detailed Diagnostics: Logs exactly why GPU setup fails (missing drivers, toolkit, libraries)
  • Multi-Backend Support: Install all working backends (--all flag) with easy switching
  • Jetson Optimized: Full support for Jetson Orin with sm_87, cuSPARSELt
  • No Sudo Required: All dependencies installed locally in venv
  • Keras 3.x Ready: Supports torch, tensorflow, and jax backends

Quick Start

git clone https://github.com/Alex-Glebov/model-setup.git
cd model-setup
python test_venv_builder.py /path/to/venv --config /path/to/hardware_config.json

Example:

python test_venv_builder.py ~/model-core/venv --config ~/model-core/hardware_config.json

Install All Working Backends

python test_venv_builder.py ~/model-core/venv --all --config ~/model-core/hardware_config.json

This installs all working backends (e.g., torch CUDA + tensorflow CUDA) and generates keras_backend.py with commented lines for easy switching.

With Custom Log File

python test_venv_builder.py ~/model-core/venv --log-file ~/setup.log --config ~/model-core/hardware_config.json

Hardware Support

Platform Status PyTorch Notes
NVIDIA Jetson Orin ✅ Supported 2.5.0+ (NVIDIA wheel) sm_87, cuSPARSELt auto-detected
NVIDIA CUDA ✅ Supported 2.x (PyPI) Auto-detects CUDA 11.8, 12.1, 12.4, 12.8+
AMD ROCm ✅ Supported 2.x (PyPI) Auto-detects ROCm 5.6, 5.7, 6.0+
CPU Only ✅ Supported 2.x (PyPI) No GPU acceleration
Windows WSL ✅ Supported 2.x (PyPI) WSL 2 required for GPU passthrough

Multi-Backend Support

Install multiple backends and switch between them:

model-setup ~/venv --all

Generates model_core/keras_backend.py:

# Active backend: torch
os.environ["KERAS_BACKEND"] = "torch"

# Alternative backends (uncomment to switch):
# Backend: tensorflow (cuda)
# os.environ["KERAS_BACKEND"] = "tensorflow"

Why PyTorch over TensorFlow?

  • Jetson: NVIDIA provides official PyTorch wheels for JetPack 6.x; TensorFlow GPU support is not officially stable
  • Performance: Better sm_87 (Orin) compute capability support
  • Installation: PyTorch wheels include all dependencies; TensorFlow requires manual dependency management
  • Keras 3.x: PyTorch is the default Keras backend with best compatibility

Architecture

Keras 3.x Unified Frontend

Model-setup installs Keras 3.x as the unified frontend API with your choice of execution backend:

┌─────────────────────────────────────────┐
│         Your Model Code                 │
│    import keras  # Same API always      │
└─────────────────────────────────────────┘
                   │
    ┌──────────────┼──────────────┐
    ▼              ▼              ▼
┌────────┐   ┌──────────┐   ┌──────────┐
│  torch │   │tensorflow│   │   jax    │
│Backend │   │ Backend  │   │ Backend  │
└────────┘   └──────────┘   └──────────┘
    │              │              │
    └──────────────┼──────────────┘
                   ▼
┌─────────────────────────────────────────┐
│        GPU Hardware (CUDA/ROCm)         │
└─────────────────────────────────────────┘

Key Concept: Keras 3.x provides a single unified API. You write model code once using keras, and it executes on TensorFlow, PyTorch, or JAX without changes.

  • Keras = The API (layers, models, training loops)
  • TensorFlow/PyTorch/JAX = The execution engine

System Architecture

model-setup (this project)
    ├── Detects hardware (hardware_detector.py)
    ├── Installs dependencies (venv_builder.py)
    ├── Creates venv with Keras + backend(s)
    └── Writes .hardware_config.json (reference for quick lookup)
           ↓
    model-core (runtime)
        ├── Imports keras from keras_backend.py (backend selection)
        ├── Auto-detects GPU via gpu_config.py (runtime detection)
        └── Runs training/inference

Note: While .hardware_config.json is available for quick reference, model-core performs its own runtime GPU detection via gpu_config.py to determine optimal training parameters (batch size, LSTM units, learning rate).

Installation Details

What Gets Installed

  1. PyTorch - Hardware-specific version with auto-detection:

    • Jetson: NVIDIA's wheel from developer.download.nvidia.com (matches JetPack version)
    • CUDA: torch from PyPI with detected CUDA version (cu118, cu121, cu124, cu128+)
    • ROCm: torch from PyPI with detected ROCm version (rocm5.6, rocm5.7, rocm6.0+)
    • CPU: torch from PyPI CPU-only
  2. cuSPARSELt (Jetson only):

    • Auto-detected and installed locally in venv/lib/cuda/lib/
    • No system-wide changes (no sudo)
  3. NumPy:

    • Version < 2.0 (pinned for PyTorch compatibility)
    • Bundled with PyTorch wheel
  4. Other Dependencies:

    • Read from model-core/requirements.txt
    • Skips packages already provided by PyTorch

Version Detection

The builder automatically detects and matches versions:

# Detected CUDA 12.2 → Uses PyTorch cu122 wheel
# Detected ROCm 5.7 → Uses PyTorch rocm5.7 wheel
# Detected JetPack 6.0 → Uses NVIDIA JetPack 6.x wheel

If detection fails, falls back to latest stable (cu121 for CUDA, rocm5.7 for ROCm).

Using the Venv

Option 1: Source activate (recommended)

cd ~/model-core
source venv/bin/activate
python train.py

Option 2: Direct Python path

# For Jetson, LD_LIBRARY_PATH must include cuSPARSELt
LD_LIBRARY_PATH=~/model-core/venv/lib/cuda/lib \
    ~/model-core/venv/bin/python train.py

Verification

import torch

print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"Device: {torch.cuda.get_device_name(0)}")

Project Structure

model-setup/
├── src/model_setup/
│   ├── __init__.py
│   ├── hardware_detector.py    # Hardware detection (GPU type, versions, WSL)
│   ├── venv_builder.py         # Venv creation with dynamic version selection
│   ├── gpu_compatibility.py    # GPU compatibility testing
│   ├── pip_version_checker.py  # PyPI availability checking
│   └── version_requirements.py # Minimum version requirements
├── test_venv_builder.py       # Self-bootstrapping CLI for dev
└── README.md                   # This file

WSL Support (Windows Subsystem for Linux)

model-setup supports WSL 2 for GPU passthrough on Windows.

Prerequisites

  • WSL 2 (WSL 1 does not support GPU)
  • GPU drivers installed on Windows host
  • WSL GPU support (/usr/lib/wsl/lib must exist)

Automatic Detection

The builder automatically detects WSL and checks prerequisites:

INFO - WSL environment detected
INFO - WSL 2 confirmed
WARNING - WSL GPU support not detected (/usr/lib/wsl/lib missing)

Manual WSL Check

# Verify WSL version
wsl.exe --version

# Should show: WSL version: 2.x.x

# Check GPU in WSL
ls /usr/lib/wsl/lib/ | grep -i cuda
# Should show: libcuda.so, libd3d12.so, etc.

Detailed Diagnostics

model-setup logs exactly why GPU setup fails. Check the log file for details:

CUDA Prerequisites Check

INFO - Detecting hardware on Linux x86_64
INFO - WSL environment detected
INFO - CUDA GPU detected
WARNING - CUDA prerequisites missing:
WARNING -   - nvidia-smi not found - NVIDIA drivers not installed
WARNING -   - CUDA toolkit not installed (nvcc not found)
WARNING -   - cuDNN library (optional but recommended)
INFO - Will attempt install anyway, but may fail

ROCm Prerequisites Check

WARNING - ROCm prerequisites missing:
WARNING -   - ROCm not installed (rocm-smi not found)
WARNING -   - HIP toolkit not installed (hipcc not found)

Log File Locations

Using model-setup:

  • Default: {venv_parent}/test_venv_builder.log
  • Custom: --log-file ~/custom.log

Using module directly:

  • {venv_path}/install.log

Configuration Priority

  1. Explicit (highest): ~/GPU/GPU_VARIANT.txt containing jetson, cuda, or rocm
  2. Auto-detect: PyTorch CUDA availability check
  3. Platform inference: aarch64 → likely Jetson

Troubleshooting

GPU Not Detected / Prerequisites Missing

Check the log file for exact reason:

cat ~/model-core/test_venv_builder.log | grep -E "(prerequisites|Missing|not found)"

Common issues:

WSL GPU Issues

# Verify WSL 2
wsl.exe --version
# Expected: WSL version: 2.x.x

# Check Windows GPU drivers
# Install from NVIDIA/AMD website on Windows host

# Verify WSL GPU passthrough
ls -la /usr/lib/wsl/lib/
# Should see: libcuda.so, libd3d12.so, libdxcore.so

"libcusparseLt.so.0: cannot open shared object file" (Jetson only)

  • Venv not activated: Run source venv/bin/activate
  • Or missing LD_LIBRARY_PATH when using direct Python path

"No module named 'torch'"

  • Venv creation failed - check hardware detection output in log file
  • Try running model-setup again with explicit path and log
  • Check: cat ~/model-core/test_venv_builder.log | grep -i error

NumPy version errors

  • model-setup automatically pins numpy<2
  • If manually installing packages, avoid upgrading numpy

Installation Fails with "Unknown install type"

This was a bug in earlier versions. Update to latest:

git pull origin develop

Jetson-Specific Requirements

  • JetPack: 6.x (R36.x in /etc/nv_tegra_release)
  • Python: 3.10
  • CUDA: 12.6 (from JetPack)
  • Special: PyTorch wheels from NVIDIA's redist, not PyPI

See GPU/docs/Jetson Orin Nano — PyTorch & TensorFlow Limitations.md for detailed compatibility matrix.

Development

Contributing

See CONTRIBUTING.md for development setup and contribution guidelines.

Testing

# Test hardware detection
python -c "from model_setup.hardware_detector import HardwareDetector; \
    h = HardwareDetector().detect(); print(h)"

# Test venv creation with logging
model-setup /tmp/test_venv --config /tmp/test_config.json --log-file /tmp/test.log

# Test with all backends
model-setup /tmp/test_venv --all --config /tmp/test_config.json

# View detailed logs
cat /tmp/test.log | grep -E "(prerequisites|Missing|detected)"

License

MIT License - See LICENSE file

Related Projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

model_setup-0.2.2.tar.gz (31.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

model_setup-0.2.2-py3-none-any.whl (31.6 kB view details)

Uploaded Python 3

File details

Details for the file model_setup-0.2.2.tar.gz.

File metadata

  • Download URL: model_setup-0.2.2.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for model_setup-0.2.2.tar.gz
Algorithm Hash digest
SHA256 814815240dee9697eea8de70244fa6d26089a018621fb7cfb501f0ab0cae0c05
MD5 f7eb10ec531c3c64dd9a826568bbfa17
BLAKE2b-256 6492b63944512a1e30f88ce0c93055a26c48ea637d5e8de7329861a4931a4155

See more details on using hashes here.

File details

Details for the file model_setup-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: model_setup-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 31.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for model_setup-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e962c7be7e55d72c17008177617229415dc9772778a9afdefc5e7140078a17ef
MD5 0711e919a54e797382aad002d3db989c
BLAKE2b-256 a078fb72f91863f9c36de0b6e6a602f3b1f98fe4a7a694d8cc56a00442c3360e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page