Skip to main content

Multi-Instance-GPU profiling tool

Project description

MIG Profiler

GitHub

MIGProfiler is a toolkit for benchmark study on NVIDIA MIG techniques. It provides profiling on multiple deep learning training and inference tasks on MIG GPUs.

MIGProfiler is featured for:

  • 🎨 Support a lot of deep learning tasks and open-sourced models on a various of benchmark type
  • 📈 Present comprehensive benchmark results
  • 🐣 Easy to use with a configuration file (WIP)

The project is under rapid development! Please check our benchmark website and join us!

Benchmark Website 📈

Coming soon!

Install 📦️

Manual install

Requirements:

  • PyTorch with CUDA
  • OpenCV
  • Sanic
  • Transformers
  • Tqdm
  • Prometheus client
# create virtual environment
conda create -n mig-perf python=3.8
conda activate mig-perf

# install required packages
conda install pytorch torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c conda-forge opencv
pip install transformers
pip install sanic tqdm prometheus_client

PyPI install

WIP

Use Docker

WIP

Quick Start 🚚

You can easily to profile on MIG GPU. Below are some common deep learning tasks to play with.

1. MIG training benchmark

We first create a 1g.10gb MIG device

# enable MIG
sudo nvidia-smi -i 0 -mig 1
# create MIG instance
sudo nvidia-smi mig -cgi 1g.10gb -C

Start DCGM metric exporter

docker run -d --rm --gpus all --net mig_perf -p 9400:9400  \
    -v "${PWD}/mig_perf/profiler/client/dcp-metrics-included.csv:/etc/dcgm-exporter/customized.csv" \
    --name dcgm_exporter --cap-add SYS_ADMIN   nvcr.io/nvidia/k8s/dcgm-exporter:2.4.7-2.6.11-ubuntu20.04 \
    -c 500 -f /etc/dcgm-exporter/customized.csv -d f

Start to profile

cd mig_perf/profiler
export PYTHONPATH=$PWD
python train/train_cv.py --bs=32 --model=resnet50 --num_batches=500 --mig-device-id=0

Remeber to disable MIG after finish benchmark

sudo nvidia-smi -i 0 -dci
sudo nvidia-smi -i 0 -dgi
sudo nvidia-smi -i 0 -mig 0

2. MIG inference benchmark

Start DCGM metric exporter

docker run -d --rm --gpus all --net mig_perf -p 9400:9400  \
    -v "${PWD}/mig_perf/profiler/client/dcp-metrics-included.csv:/etc/dcgm-exporter/customized.csv" \
    --name dcgm_exporter --cap-add SYS_ADMIN   nvcr.io/nvidia/k8s/dcgm-exporter:2.4.7-2.6.11-ubuntu20.04 \
    -c 500 -f /etc/dcgm-exporter/customized.csv -d f

Start to profile

cd mig_perf/profiler
export PYTHONPATH=$PWD
python client/block_infernece_cv.py --bs=32 --model=resnet50 --num_batches=500 --mig-device-id=0

See more benchmark experiments in ./exp.

3. Visualize

  • in notebook
  • in Prometheus (under improvement)

Cite Us 🌱

@article{zhang2022migperf,
  title={MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs},
  author={Zhang, Huaizheng and Li, Yuanming and Xiao, Wencong and Huang, Yizheng and Di, Xing and Yin, Jianxiong and See, Simon and Luo, Yong and Lau, Chiew Tong and You, Yang},
  journal={arXiv preprint arXiv:2301.00407},
  year={2023}
}

Contributors 👥

  • Yuanming Li
  • Huaizheng Zhang
  • Yizheng Huang
  • Xing Di

Ackowledgement

Special thanks to Aliyun and NVIDIA AI Tech Center to provide MIG GPU server for benchmarking.

License

This repository is open-sourced under MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

migperf-0.0.1.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

migperf-0.0.1-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file migperf-0.0.1.tar.gz.

File metadata

  • Download URL: migperf-0.0.1.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.1

File hashes

Hashes for migperf-0.0.1.tar.gz
Algorithm Hash digest
SHA256 c37811fba86cd4169d0e9dfa99969e92128317997241d9bab328129affafd0c9
MD5 28f959fe451e2dc152d626e68fe35622
BLAKE2b-256 586423168cab60b1adeb9478ef62728b265da7185e4caeb56177fb6ddadeb4bb

See more details on using hashes here.

File details

Details for the file migperf-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: migperf-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.1

File hashes

Hashes for migperf-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9fc00bd5f8a3a1bdea9bc83f193fe297cc5db844f529b807e306fdbc4351e926
MD5 458022a864c2d7307e8295cc2356a0a9
BLAKE2b-256 3999049d80a490dabcdce2384994241ffd34edb204d4347b281d7bbe43ce477a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page