GPU基准测试工具 - 用于评估NVIDIA GPU性能的综合工具包

These details have not been verified by PyPI

Project links

Project description

GPU Benchmark Linux

一个专业的GPU基准测试和压力测试工具，完全替代gpu burn方案，提供全新的GPU压力测试、硬件监控和可视化报告功能。

🚀 核心特性

✅ GPU压力测试: 矩阵乘法、计算密集型、内存带宽测试
✅ 实时硬件监控: 温度、功耗、GPU/内存利用率监控
✅ 安全保护机制: 温度/功耗限制，自动停止保护
✅ 可视化报告: HTML交互式报告，Chart.js专业图表
✅ 多GPU支持: 并行测试多个GPU设备
✅ CUDA兼容: 支持CUDA Toolkit 12+和主流NVIDIA显卡
✅ 多格式输出: JSON、CSV、HTML多种报告格式

📋 系统要求

操作系统: Linux (推荐Ubuntu 18.04+)
硬件: NVIDIA显卡
软件:
- CUDA Toolkit 12.0+
- NVIDIA驱动程序
- Python 3.8+

🔧 安装

方式一：PyPI安装

pip install gpu-benchmark-linux

方式二：源码安装

git clone <项目地址>
cd GPUBenchmark/gpu_benchmark
pip install -e .

依赖包

# 核心依赖
pip install cupy-cuda12x nvidia-ml-py3 numpy psutil

🎯 快速开始

命令行使用

# 运行所有测试（默认60秒）
python3 -m gpu_benchmark_linux --test all

# 指定测试时间（5分钟）
python3 -m gpu_benchmark_linux --test all --duration 300

# 指定输出目录
python3 -m gpu_benchmark_linux --test all --output results

# 查看帮助
python3 -m gpu_benchmark_linux --help

测试类型

测试类型	说明	用途
`env`	环境检查	验证CUDA、驱动、依赖
`cuda`	CUDA功能测试	GPU计算性能基准测试
`model`	模型推理测试	AI模型推理性能测试
`all`	完整测试套件	全面的GPU性能评估

编程接口

from gpu_benchmark_linux.stress_test import GPUStressTester, StressTestConfig

# 创建测试配置
config = StressTestConfig(
    duration=300,  # 5分钟测试
    device_ids=[0, 1],  # 测试GPU 0和1
    test_types=['matrix_multiply', 'compute_intensive', 'memory_bandwidth'],
    temperature_limit=85.0,  # 温度限制85°C
    auto_stop_on_limit=True  # 超限自动停止
)

# 运行压力测试
tester = GPUStressTester()
result = tester.run_stress_test(config)

# 检查结果
if result.success:
    print(f"测试成功！总GFLOPS: {result.performance_metrics.get('total_gflops', 0)}")
    print(f"HTML报告: {result.html_report_path}")  # 自动生成HTML报告
else:
    print(f"测试失败: {result.error_message}")

📊 输出文件说明

测试完成后会在输出目录生成以下文件：

1. 日志文件

benchmark_YYYYMMDD_HHMMSS.log  # 详细的测试日志

2. JSON结果文件

stress_test_TIMESTAMP.json     # 机器可读的测试结果

3. HTML可视化报告 ⭐

benchmark_report_YYYYMMDD_HHMMSS.html  # 交互式可视化报告

HTML报告包含：

📊 系统信息概览
⚡ 性能指标仪表板
📈 实时监控图表（温度、功耗、利用率）
🎯 设备测试结果详情
📉 性能对比图表

🔍 核心功能详解

1. GPU压力测试类型

矩阵乘法测试

测试GPU浮点计算性能
输出GFLOPS性能指标
验证计算精度

计算密集型测试

持续高强度计算负载
测试GPU稳定性
评估散热性能

内存带宽测试

测试GPU显存带宽
评估内存子系统性能
检测内存错误

2. 硬件监控功能

# 实时监控示例
from gpu_benchmark_linux.monitor import GPUMonitor

monitor = GPUMonitor()
monitor.start_monitoring(interval=1.0)  # 每秒采样

# 获取实时数据
metrics = monitor.get_current_metrics()
for metric in metrics:
    print(f"GPU {metric.device_id}: {metric.temperature}°C, {metric.power_usage}W")

3. 安全保护机制

温度保护: 超过设定温度自动停止测试
功耗保护: 功耗超限自动终止
异常处理: 完善的错误恢复机制
资源清理: 测试结束自动清理GPU资源

📈 性能基准参考

GPU型号	矩阵乘法(GFLOPS)	内存带宽(GB/s)	典型功耗(W)
RTX 4090	15000-20000	900-1000	350-450
RTX 4080	12000-15000	700-800	280-350
RTX 3090	10000-13000	800-900	300-400
RTX 3080	8000-11000	650-750	250-320
RTX 3070	6000-8000	450-550	200-250

🛠️ 高级配置

自定义测试参数

config = StressTestConfig(
    duration=600,  # 10分钟测试
    matrix_size=8192,  # 更大的矩阵（更高负载）
    memory_usage_ratio=0.9,  # 使用90%显存
    monitor_interval=0.5,  # 0.5秒监控间隔
    temperature_limit=80.0,  # 更严格的温度限制
    power_limit_ratio=0.95,  # 功耗限制比例
)

添加测试回调

def test_progress_callback(result):
    """测试进度回调"""
    print(f"测试完成，耗时: {result.duration:.1f}秒")
    if result.success:
        print("✅ 所有测试通过")
        # 自动发送邮件通知等
    else:
        print("❌ 测试失败，请检查日志")

tester.add_callback(test_progress_callback)

批量测试脚本

#!/usr/bin/env python3
"""批量GPU测试脚本"""

import time
from gpu_benchmark_linux.stress_test import GPUStressTester, StressTestConfig

def run_batch_tests():
    """运行批量测试"""
    test_configs = [
        ("短时测试", StressTestConfig(duration=60)),
        ("中等测试", StressTestConfig(duration=300)),
        ("长时测试", StressTestConfig(duration=1800)),
    ]
    
    tester = GPUStressTester()
    results = {}
    
    for name, config in test_configs:
        print(f"开始 {name}...")
        result = tester.run_stress_test(config)
        results[name] = result
        
        if result.success:
            print(f"✅ {name} 完成")
        else:
            print(f"❌ {name} 失败: {result.error_message}")
        
        time.sleep(30)  # 测试间隔
    
    return results

if __name__ == "__main__":
    run_batch_tests()

🔧 故障排除

1. CUDA不可用

# 检查NVIDIA驱动
nvidia-smi

# 检查CUDA版本  
nvcc --version

# 安装CUDA Toolkit
# 访问: https://developer.nvidia.com/cuda-toolkit

2. 依赖包问题

# 重新安装依赖
pip install --upgrade cupy-cuda12x nvidia-ml-py3 numpy

# 如果CuPy安装失败，尝试预编译版本
pip install cupy-cuda12x

# 检查CUDA版本兼容性
python -c "import cupy; print(cupy.cuda.runtime.runtimeGetVersion())"

3. 权限问题

# 确保用户在docker组中（如果使用Docker）
sudo usermod -a -G docker $USER

# 检查GPU设备权限
ls -la /dev/nvidia*

# 重新登录或重启

4. 内存不足

# 检查GPU内存
nvidia-smi

# 降低内存使用比例
config.memory_usage_ratio = 0.7  # 使用70%显存

5. 温度过高

# 检查散热
nvidia-smi -q -d TEMPERATURE

# 降低温度限制
config.temperature_limit = 75.0  # 75°C限制

# 改善散热环境

📁 项目结构

gpu_benchmark_linux/
├── __init__.py           # 包初始化
├── __main__.py          # 命令行入口
├── benchmark.py         # 基准测试集成
├── cuda_ops.py         # CUDA计算操作封装
├── monitor.py          # GPU硬件监控
├── stress_test.py      # 压力测试核心逻辑
├── reporter.py         # 结果输出管理
├── html_reporter.py    # HTML可视化报告生成器
├── exceptions.py       # 错误处理机制
├── utils.py           # 工具函数库
└── tests/             # 单元测试
    ├── __init__.py
    ├── cuda_tests.py
    └── model_tests.py

🔄 更新日志

v0.1.9 (当前版本)

✅ 重大更新: 完全移除gpu burn依赖
✅ 新功能: 实现全新的CUDA压力测试方案
✅ 可视化: 添加HTML交互式报告功能
✅ 兼容性: 支持CUDA Toolkit 12+
✅ 监控: 增强的硬件监控功能
✅ 架构: 模块化架构重构
✅ 稳定性: 改进的错误处理机制
✅ 安全性: 完善的温度和功耗保护

v0.1.8

基础GPU测试功能
简单的结果输出

🤝 贡献指南

欢迎提交Issue和Pull Request！

开发环境设置

git clone <项目地址>
cd gpu_benchmark
pip install -e ".[dev]"

运行测试

python -m pytest gpu_benchmark_linux/tests/

📄 许可证

MIT License - 详见 LICENSE 文件

🆘 技术支持

如果遇到问题，请：

检查系统要求: 确保满足最低硬件和软件要求
查看日志: 检查详细的日志文件获取错误信息
参考文档: 查看故障排除部分
提交Issue: 在GitHub上提交详细的问题报告

⚠️ 重要提示: 本工具设计用于专业的GPU性能测试和压力测试，请在良好通风的环境下使用，并注意监控硬件温度，避免硬件损坏。

🎯 项目目标: 提供比gpu burn更强大、更安全、更易用的GPU压力测试解决方案。

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.4

Aug 19, 2025

0.4.3

Aug 19, 2025

0.4.2

Aug 19, 2025

0.4.1

Aug 19, 2025

0.4.0

Aug 19, 2025

0.3.1

Aug 19, 2025

0.3.0

Aug 19, 2025

This version

0.2.3

Aug 19, 2025

0.2.2

Aug 18, 2025

0.2.1

Aug 18, 2025

0.2.0

Aug 18, 2025

0.1.9

Aug 18, 2025

0.1.8

Aug 18, 2025

0.1.7

Aug 18, 2025

0.1.6

Aug 18, 2025

0.1.5

Aug 18, 2025

0.1.4

Aug 18, 2025

0.1.3

Aug 18, 2025

0.1.2

Aug 18, 2025

0.1.1

Aug 18, 2025

0.1.0

Aug 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpu_benchmark_linux-0.2.3.tar.gz (63.1 kB view details)

Uploaded Aug 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gpu_benchmark_linux-0.2.3-py3-none-any.whl (69.5 kB view details)

Uploaded Aug 19, 2025 Python 3

File details

Details for the file gpu_benchmark_linux-0.2.3.tar.gz.

File metadata

Download URL: gpu_benchmark_linux-0.2.3.tar.gz
Upload date: Aug 19, 2025
Size: 63.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for gpu_benchmark_linux-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`593f34303dcfd7d1387983f5485e47fcc4cb8e5f77d21c7d88bbc92eac936c1c`
MD5	`417f727075dbd5cd9caf60d832cd76c6`
BLAKE2b-256	`8b896973a35091fe3c5193241067b08ec0303d99e943207e2d0de4e827e986e0`

See more details on using hashes here.

File details

Details for the file gpu_benchmark_linux-0.2.3-py3-none-any.whl.

File metadata

Download URL: gpu_benchmark_linux-0.2.3-py3-none-any.whl
Upload date: Aug 19, 2025
Size: 69.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for gpu_benchmark_linux-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6e38e044d45222474194a518e316911cbbd04d8f244a9aa383dae83d37c1c012`
MD5	`ea2d2dcfffe2548d467465278a7dc9fe`
BLAKE2b-256	`70a923347bda68e3e2679a6b107da272cbf060c548d5e5c5579b69df89ec3ad8`

See more details on using hashes here.

gpu-benchmark-linux 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GPU Benchmark Linux

🚀 核心特性

📋 系统要求

🔧 安装

方式一：PyPI安装

方式二：源码安装

依赖包

🎯 快速开始

命令行使用

测试类型

编程接口

📊 输出文件说明

1. 日志文件

2. JSON结果文件

3. HTML可视化报告 ⭐

🔍 核心功能详解

1. GPU压力测试类型

矩阵乘法测试

计算密集型测试

内存带宽测试

2. 硬件监控功能

3. 安全保护机制

📈 性能基准参考

🛠️ 高级配置

自定义测试参数

添加测试回调

批量测试脚本

🔧 故障排除

1. CUDA不可用

2. 依赖包问题

3. 权限问题

4. 内存不足

5. 温度过高

📁 项目结构

🔄 更新日志

v0.1.9 (当前版本)

v0.1.8

🤝 贡献指南

开发环境设置

运行测试

📄 许可证

🆘 技术支持

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes