Skip to main content

An interactive NVIDIA-GPU process viewer.

Project description

gpu_lurker

Python 3.5+ PyPI status Top Language License

服务器 GPU 监控程序,当 GPU 属性满足预设条件 (如至少有 4 张卡且每张卡的显存多于 1000M) 时通过微信发送提示消息。

服务器上输入监控命令:

new_app

满足条件后发送消息到微信:

new_app

安装

从 PyPI 上安装:

pip install --upgrade gpulurker

从 GitHub 上安装最新版本 (推荐):

pip install git+https://github.com/RenShuhuai-Andy/gpu_lurker.git#egg=gpulurker

或克隆该仓库手动安装:

git clone --depth=1 https://github.com/RenShuhuai-Andy/gpu_lurker.git
cd gpu_lurker
pip install .

使用

一、在WxPusher微信推送服务上注册并且创建应用

  1. 进入https://wxpusher.zjiecode.com/admin/login,使用微信扫码关注「新消息服务」公众号并完善信息。

  2. 创建新的应用,创建成功后请保存好显示的 APP_TOKEN

new_app

  1. 微信扫码关注应用

subscribe

  1. 进入「新消息服务」公众号,点击「我的」-「我的UID」获取 UID

二、在服务器上输入命令进行 GPU 监控

# 每隔 30 分钟检查服务器状态,当有 8 张卡,每张卡的显存多余 1000M 时,向微信发送提示消息
gpulurker -m 1000 -n 8 -f 30m

首次使用时需要输入自己的 UIDAPP_TOKEN

主要参数

  • -m, --cuda-memory: 每张卡所需的显存 (默认为 5000 MB)
  • -n, --device-num: 所需的 GPU 数 (默认为 1 块)
  • -f, --check-freq: 检查服务器状态的间隔时间,如1d (1天),1h (1小时),1m (1分钟),1s (1秒),1h30m (1小时30分钟) 等。默认为 10 分钟
  • -r, --reload: 重新输入用户信息 (包括 UID 和 APP_TOKEN,默认关闭)
  • -c, --continuous: 满足条件时持续推送消息 (默认关闭)

键入 ctrl+c 终止监控。

键入 gpulurker --help 以获得更多信息:

usage: gpulurker [-h] [-m CUDA_MEMORY] [-n DEVICE_NUM] [-f CHECK_FREQ] [-r]
                 [--log_file LOG_FILE]

check if gpu is available and notify on your WeChat

optional arguments:
  -h, --help            show this help message and exit
  -m CUDA_MEMORY, --cuda-memory CUDA_MEMORY
                        Required CUDA memory per device
  -n DEVICE_NUM, --device-num DEVICE_NUM
                        Required number of devices
  -f CHECK_FREQ, --check-freq CHECK_FREQ
                        Frequency of inspection, eg. 10m (10 minutes)
  -r, --reload          Reload and update your appToken and uid
  -c, --continuous      Continue to push message when the conditions are met
  --log_file LOG_FILE   define the threshold of avaliable (in MB)

致谢

本项目参考了以下仓库的代码:

微信消息实时推送服务由 WxPusher 提供。

许可证

GNU General Public License, version 3 (GPLv3)

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpulurker-0.1.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gpulurker-0.1-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file gpulurker-0.1.tar.gz.

File metadata

  • Download URL: gpulurker-0.1.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.0 importlib_metadata/3.7.3 packaging/20.9 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.13

File hashes

Hashes for gpulurker-0.1.tar.gz
Algorithm Hash digest
SHA256 b02a4b4c731b611ad2064527f116a36feaaee30871455e864bfd32f563d6b640
MD5 0ae789d0c5d90562cfe51e7ecce1f127
BLAKE2b-256 8133dbad78f0ff7ac33c521bacdc097e1adc7ca71fbb5cd9757f55eb57c98d28

See more details on using hashes here.

File details

Details for the file gpulurker-0.1-py3-none-any.whl.

File metadata

  • Download URL: gpulurker-0.1-py3-none-any.whl
  • Upload date:
  • Size: 24.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.0 importlib_metadata/3.7.3 packaging/20.9 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.13

File hashes

Hashes for gpulurker-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a316b98014f4e240cfe72095a53da47d8d2b998e9cdd21fb1edd10da9f78bd33
MD5 856dd276df83d66621328501dc18f4a1
BLAKE2b-256 07f2953c5151f15ea9d3fd280b59d96bdcd0a74c4f715e349054e1a069c0b17e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page