Skip to main content

Multi-Instance-GPU profiling tool

Project description

MIG Profiler

GitHub

MIGProfiler is a toolkit for benchmark study on NVIDIA MIG techniques. It provides profiling on multiple deep learning training and inference tasks on MIG GPUs.

MIGProfiler is featured for:

  • 🎨 Support a lot of deep learning tasks and open-sourced models on a various of benchmark type
  • 📈 Present comprehensive benchmark results
  • 🐣 Easy to use with a configuration file (WIP)

The project is under rapid development! Please check our benchmark website and join us!

Benchmark Website 📈

Coming soon!

Install 📦️

Manual install

Requirements:

  • PyTorch with CUDA
  • OpenCV
  • Sanic
  • Transformers
  • Tqdm
  • Prometheus client
# create virtual environment
conda create -n mig-perf python=3.8
conda activate mig-perf

# install required packages
conda install pytorch torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c conda-forge opencv
pip install transformers
pip install sanic tqdm prometheus_client

PyPI install

WIP

Use Docker

WIP

Quick Start 🚚

You can easily to profile on MIG GPU. Below are some common deep learning tasks to play with.

1. MIG training benchmark

We first create a 1g.10gb MIG device

# enable MIG
sudo nvidia-smi -i 0 -mig 1
# create MIG instance
sudo nvidia-smi mig -cgi 1g.10gb -C

Start DCGM metric exporter

docker run -d --rm --gpus all --net mig_perf -p 9400:9400  \
    -v "${PWD}/mig_perf/profiler/client/dcp-metrics-included.csv:/etc/dcgm-exporter/customized.csv" \
    --name dcgm_exporter --cap-add SYS_ADMIN   nvcr.io/nvidia/k8s/dcgm-exporter:2.4.7-2.6.11-ubuntu20.04 \
    -c 500 -f /etc/dcgm-exporter/customized.csv -d f

Start to profile

cd mig_perf/profiler
export PYTHONPATH=$PWD
python train/train_cv.py --bs=32 --model=resnet50 --num_batches=500 --mig-device-id=0

Remeber to disable MIG after finish benchmark

sudo nvidia-smi -i 0 -dci
sudo nvidia-smi -i 0 -dgi
sudo nvidia-smi -i 0 -mig 0

2. MIG inference benchmark

Start DCGM metric exporter

docker run -d --rm --gpus all --net mig_perf -p 9400:9400  \
    -v "${PWD}/mig_perf/profiler/client/dcp-metrics-included.csv:/etc/dcgm-exporter/customized.csv" \
    --name dcgm_exporter --cap-add SYS_ADMIN   nvcr.io/nvidia/k8s/dcgm-exporter:2.4.7-2.6.11-ubuntu20.04 \
    -c 500 -f /etc/dcgm-exporter/customized.csv -d f

Start to profile

cd mig_perf/profiler
export PYTHONPATH=$PWD
python client/block_infernece_cv.py --bs=32 --model=resnet50 --num_batches=500 --mig-device-id=0

See more benchmark experiments in ./exp.

3. Visualize

  • in notebook
  • in Prometheus (under improvement)

Cite Us 🌱

@article{zhang2022migperf,
  title={MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs},
  author={Zhang, Huaizheng and Li, Yuanming and Xiao, Wencong and Huang, Yizheng and Di, Xing and Yin, Jianxiong and See, Simon and Luo, Yong and Lau, Chiew Tong and You, Yang},
  journal={arXiv preprint arXiv:2301.00407},
  year={2023}
}

Contributors 👥

  • Yuanming Li
  • Huaizheng Zhang
  • Yizheng Huang
  • Xing Di

Ackowledgement

Special thanks to Aliyun and NVIDIA AI Tech Center to provide MIG GPU server for benchmarking.

License

This repository is open-sourced under MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

migperf-0.0.1.tar.gz (18.1 kB view hashes)

Uploaded Source

Built Distribution

migperf-0.0.1-py3-none-any.whl (20.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page