DL training on GPU management
Project description
GPU limit management
机器学习领域的一些实验,由于参数较多,通常需要对不同参数跑多组实验。
本项目维护使用GPU程序的任务队列,动态调度任务。避免手动跑实验带来的繁琐感受。
install
代码还在大改中,bug仍很多。。。
源码安装
git clone https://github.com/lichunown/gpu-limit.git
cd gpu-limit
python setup.py install
pip 安装
pip3 install gpulimit
usage
本程序使用linux socket进行交互,后台gpulimit_server
动态调度,前台gpulimit
发送命令,获取信息。
启动后台服务
gpulimit_server # 直接启动
nohup gpulimit_server & # 后台运行
前台命令
$ gpulimitc help
GPU Task Manage:
usage:
client.py -h show help
gpulimit add [cmds] add task [cmds] to gpulimit queue.
other commands:
help [cmd] show help
add [cmds] ls GPU task queue status
ls ls GPU task queue status
show [id] show task [id] details.
rm [id] remove task [id] from manage,
if task is running, kill it.
kill [id] kill task [id]
move [id] [index(default=0)] move [id] to [index]
set [name] [value] set some property.
start [id defalut=None] Force start task(s).
log [id] show [id] output.
status show System status.
debug [id] if task [id] is `CMD_ERROR`,
use this show error traceback.
添加任务
gpulimit add [cmds]
# for example
# gpulimit add python3 main.py --lambda=12 --alpha=1
查看任务
gpulimit ls
查看任务信息
gpulimit ls
查看任务输出日志
gpulimit log [task id]
同样,也支持查看gpulimit_server
的后台输出:
gpulimit log main
TODO list
- change raise type, and add
try except
for exception break. - __doc__
- kill all, range
- add commits
- use priority queue as task_manage.queue
- Improve scheduling aligorithm
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gpulimit-0.1.3.tar.gz
(28.7 kB
view hashes)
Built Distribution
gpulimit-0.1.3-py3-none-any.whl
(36.3 kB
view hashes)