Multi-vendor GPU health monitoring supporting old GPUs for e-waste reduction

These details have not been verified by PyPI

Project links

Homepage

Project description

A comprehensive multi-vendor GPU health monitoring and optimization tool that helps users assess GPU performance and select optimal hardware for their workloads.

🚀 Features

🔥 Comprehensive GPU Health Monitoring: Temperature, power, utilization, and throttling detection

⚡ Advanced Stress Testing: Compute, memory bandwidth, VRAM, and mixed-precision tests

📊 Detailed Health Scoring: 100-point scoring system with actionable recommendations

🖥️ Multi-GPU Support: Test and compare multiple GPUs simultaneously

🧪 Mock Mode: Test on any computer without GPUs (perfect for development)

🔌 Multi-Vendor Support: NVIDIA, AMD, Intel, and Mock mode

☁️ Cloud-Ready: Designed to help select optimal GPUs for cloud deployment (coming soon!)

Installation

Basic Installation (Works on any system with GPU)

For systems with any GPU (NVIDIA, AMD, Intel)

pip install gpu-benchmark-tool

Includes PyTorch for computational stress tests

Installation with Enhanced GPU Support

For NVIDIA GPUs (adds NVIDIA monitoring + TensorRT for INT8)

pip install gpu-benchmark-tool[nvidia]

For AMD GPUs (relies on system ROCm)

pip install gpu-benchmark-tool[amd]

For Intel GPUs (adds Intel GPU acceleration)

pip install gpu-benchmark-tool[intel]

For all GPU vendors (maximum compatibility)

pip install gpu-benchmark-tool[all]

🎯 Quick Start

Check Available GPUs gpu-benchmark list
Run Benchmark

Benchmark all GPUs

gpu-benchmark benchmark

Benchmark specific GPU (recommended)

gpu-benchmark benchmark --gpu-id 0

Quick 30-second test

gpu-benchmark benchmark --gpu-id 0 --duration 30

Export results to JSON

gpu-benchmark benchmark --gpu-id 0 --export results.json

Mock Mode (No GPU Required)

Perfect for development or systems without GPUs

gpu-benchmark benchmark --mock --duration 30

📊 Google Colab Quick Start

Run in a Colab notebook (Runtime > Change runtime type > GPU)

!pip install gpu-benchmark-tool[nvidia] !gpu-benchmark benchmark --gpu-id 0 --duration 30

Understanding Results

Health Score (0-100 points) 85-100: 🟢 Healthy - Safe for all workloads including AI training 70-84: 🟢 Good - Suitable for most workloads 55-69: 🟡 Degraded - Limit to inference or light compute 40-54: 🟡 Warning - Monitor closely, avoid heavy workloads 0-39: 🔴 Critical - Do not use for production

Score Components

Each component contributes to the total 100-point score:

Temperature (20 points)

Peak temperature during stress test
Under 80°C: Full points
80-85°C: 15 points
85-90°C: 10 points
Over 90°C: 5 points

Baseline Temperature (10 points)

GPU temperature at idle
Under 50°C: Full points
50-60°C: 5 points
Over 60°C: 0 points

Power Efficiency (10 points)

Power consumption optimization
Within optimal range: Full points
Slightly outside range: 5 points
Far from optimal: 0 points

GPU Utilization (10 points)

How well the GPU is utilized during tests
99%+: Full points
90-98%: 5 points
Under 90%: 0 points

Throttling (20 points)

Thermal or power throttling detection
No throttling: Full points
Occasional throttling: 10-15 points
Frequent throttling: 0-5 points

Errors (20 points)

Stability during stress tests
No errors: Full points
Few errors: 10-15 points
Many errors: 0-5 points

Temperature Stability (10 points)

Temperature consistency during tests
Very stable: Full points
Some fluctuation: 5-7 points
Unstable: 0-5 points

Performance Metrics

Matrix Multiplication: Raw compute performance (TFLOPS) Memory Bandwidth: Memory throughput (GB/s) VRAM Stress: Memory allocation stability Mixed Precision: FP16/BF16 support for AI workloads

Command Line Usage

Benchmark Command

gpu-benchmark benchmark [OPTIONS]

Options: --gpu-id INTEGER Specific GPU to test (default: all GPUs) --duration INTEGER Test duration in seconds (default: 60) --basic Run basic tests only (faster) --export TEXT Export results to JSON file --verbose Show detailed output --mock Use mock GPU (no hardware required)

Examples

Full test on GPU 0 with export

gpu-benchmark benchmark --gpu-id 0 --duration 120 --export full_test.json

Quick health check

gpu-benchmark benchmark --gpu-id 0 --duration 30 --basic

Development testing

gpu-benchmark benchmark --mock --export mock_results.json

Real-time Monitoring

Monitor GPU metrics in real-time (NVIDIA only)

gpu-benchmark monitor --gpu-id 0

Python API Usage

Basic Usage

import pynvml from gpu_benchmark import run_full_benchmark

Initialize NVML

pynvml.nvmlInit() handle = pynvml.nvmlDeviceGetHandleByIndex(0)

Run benchmark

results = run_full_benchmark( handle=handle, duration=60, enhanced=True, device_id=0 )

Access results

print(f"Health Score: {results['health_score']['score']}/100") print(f"Status: {results['health_score']['status']}")

Analyzing Results

Check if GPU is healthy for production

if results['health_score']['score'] >= 70: print("✅ GPU is suitable for production workloads") else: print("⚠️ GPU needs attention")

Access performance metrics

if 'performance_tests' in results: tflops = results['performance_tests']['matrix_multiply']['tflops'] print(f"Compute Performance: {tflops:.2f} TFLOPS")

🔧 Troubleshooting

Common Issues

"No GPUs found"

Use --mock flag for testing without GPUs Ensure NVIDIA/AMD/Intel drivers are installed For AMD: Install ROCm drivers and PyTorch with ROCm support For Intel: Install Intel GPU drivers and Intel Extension for PyTorch

"NVML Error" on Colab

This warning can be ignored - the tool still works correctly Use --gpu-id 0 for cleaner output

"PyTorch not available"

The base installation now includes PyTorch If you see this error, try: pip install gpu-benchmark-tool[nvidia]

Low Health Scores

Check system cooling Ensure GPU isn't thermal throttling Close other GPU applications Multi-GPU JSON Format

Use --gpu-id 0 to test single GPU (simpler output) Without --gpu-id, results are nested under 'results' key

Supported GPUs

NVIDIA GPUs (Full Support) Consumer: RTX 4090, 4080, 4070, 3090, 3080, 3070, 3060 Data Center: A100, V100, T4, P100, K80 Workstation: RTX A6000, A5000, A4000 AMD GPUs (ROCm Required) MI250X, MI210, MI100 Radeon RX 7900 XTX, RX 6900 XT Intel GPUs (Limited Support) Arc A770, A750 Intel Xe integrated graphics

Requirements

Python 3.8 or higher For NVIDIA: CUDA drivers For AMD: ROCm drivers For Intel: Intel GPU drivers

📄 License MIT License - see LICENSE file for details.

🙏 Acknowledgments Built to solve real-world GPU selection challenges and reduce cloud computing costs through better hardware decisions.

📧 Contact PyPI: https://pypi.org/project/gpu-benchmark-tool/ Email: ywrajput@gmail.com

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.6.5

Oct 2, 2025

0.6.4

Oct 2, 2025

0.6.3

Oct 2, 2025

0.6.2

Oct 2, 2025

0.6.1

Oct 2, 2025

0.6.0

Oct 2, 2025

0.5.5

Sep 4, 2025

0.5.4

Sep 4, 2025

0.5.3

Sep 4, 2025

0.5.2

Sep 4, 2025

0.5.1

Sep 4, 2025

0.5.0

Sep 3, 2025

0.4.7

Aug 18, 2025

0.4.6

Aug 9, 2025

0.4.5

Aug 9, 2025

0.4.4

Aug 4, 2025

0.4.3

Aug 4, 2025

0.4.2

Aug 4, 2025

0.4.1

Aug 4, 2025

This version

0.4.0

Aug 4, 2025

0.3.9

Aug 4, 2025

0.3.8

Aug 4, 2025

0.3.7

Aug 4, 2025

0.3.6

Aug 4, 2025

0.3.5

Aug 4, 2025

0.3.4

Aug 3, 2025

0.3.3

Aug 3, 2025

0.3.2

Aug 2, 2025

0.3.1

Jul 31, 2025

0.3.0

Jul 31, 2025

0.2.9

Jul 30, 2025

0.2.8

Jul 29, 2025

0.2.7

Jul 29, 2025

0.2.6

Jul 29, 2025

0.2.5

Jul 29, 2025

0.2.4

Jul 28, 2025

0.2.3

Jul 28, 2025

0.2.2

Jul 9, 2025

0.2.1

Jul 9, 2025

0.2.0

Jul 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpu_benchmark_tool-0.4.0.tar.gz (55.3 kB view details)

Uploaded Aug 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gpu_benchmark_tool-0.4.0-py3-none-any.whl (67.5 kB view details)

Uploaded Aug 4, 2025 Python 3

File details

Details for the file gpu_benchmark_tool-0.4.0.tar.gz.

File metadata

Download URL: gpu_benchmark_tool-0.4.0.tar.gz
Upload date: Aug 4, 2025
Size: 55.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for gpu_benchmark_tool-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`bdeb53105e38e09aac9d0063cac63cc945ed845e321e40ba4bb09dadafd2e1a2`
MD5	`c44d1218a667817c1c4c529cdfcd2320`
BLAKE2b-256	`094e9d0f8a6cd5f01cc5b154c8b15d4768c2b70b785fe8bd36868a2f5e068a13`

See more details on using hashes here.

File details

Details for the file gpu_benchmark_tool-0.4.0-py3-none-any.whl.

File metadata

Download URL: gpu_benchmark_tool-0.4.0-py3-none-any.whl
Upload date: Aug 4, 2025
Size: 67.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for gpu_benchmark_tool-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`93c5a6daa7c84de297c708490f7ccc0cd87f1fd3553bbcc13fd4d0a2e7a3ee51`
MD5	`51c14c41504f0fbc53676af92064608a`
BLAKE2b-256	`f0134e2271ae62d604d82ce83d9c62b6b522f5369ebbb549546e132d65143393`

See more details on using hashes here.

gpu-benchmark-tool 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

For systems with any GPU (NVIDIA, AMD, Intel)

Includes PyTorch for computational stress tests

For NVIDIA GPUs (adds NVIDIA monitoring + TensorRT for INT8)

For AMD GPUs (relies on system ROCm)

For Intel GPUs (adds Intel GPU acceleration)

For all GPU vendors (maximum compatibility)

Benchmark all GPUs

Benchmark specific GPU (recommended)

Quick 30-second test

Export results to JSON

Perfect for development or systems without GPUs

Run in a Colab notebook (Runtime > Change runtime type > GPU)

Understanding Results

Score Components

Performance Metrics

Command Line Usage

Examples

Full test on GPU 0 with export

Quick health check

Development testing

Real-time Monitoring

Monitor GPU metrics in real-time (NVIDIA only)

Python API Usage

Initialize NVML

Run benchmark

Access results

Check if GPU is healthy for production

Access performance metrics

Common Issues

Low Health Scores

Supported GPUs

Requirements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes