Skip to main content

DL training on GPU management

Project description

GPU limit management

机器学习领域的一些实验,由于参数较多,通常需要对不同参数跑多组实验。

本项目维护使用GPU程序的任务队列,动态调度任务。避免手动跑实验带来的繁琐感受。

install

setup.py的编写似乎有些问题,现在还在调试。

git clone https://github.com/lichunown/gpu-limit.git
cd gpu-limit
python setup.py install

usage

本程序使用linux socket进行交互,后台gpulimit_server动态调度,前台gpulimit发送命令,获取信息。

启动服务

gpulimit_server # 直接启动
nohup gpulimit_server & # 后台运行

前台命令

添加任务

gpulimit add [cmds]
# for example
# gpulimit add python3 main.py --lambda=12 --alpha=1

查看任务

gpulimit ls

查看任务信息

gpulimit ls

查看任务输出日志

gpulimit log [task id]

同样,也支持查看gpulimit_server的后台输出:

gpulimit log main

TODO list

  • start
  • kill all, range
  • ls show use gpu & running target
  • change raise type, and add try except for exception break.
  • commit & __doc__

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpulimit-0.1.0.tar.gz (26.6 kB view hashes)

Uploaded Source

Built Distribution

gpulimit-0.1.0-py3-none-any.whl (34.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page