tlaunch
Project description
TLaunch_dev
Introduction
安装
virtualenv tmarl_env -p python3
# under tmarl_env, go into this repo and execute:
pip install -r requirements_dev.txt
pip install .
Quick Start
1.使用SSH登录TPod
TPod是一款面向分布式场景,为TLaunch准备的用户及资源管理工具。当管理员使用TPod创建用户后,可以自定义的为其指定分配的系统资源(包括CPU、GPU、Memory、Storage),并为该用户创建一个已经预装好TLaunch框架的TPod开发机,用户可以直接通过SSH登录该机器访问集群,以快速进入开发流程。
TPod的存储结构
在TPod中,我们会在/TData目录下创建你的个人文件夹,该文件夹会通过挂载的文件系统与远端同步。/TData内会预先创建以下内容:
code:用于存放训练代码setup.sh:用于指定训练环境及代码的安装方法
data:用于存放训练所需的数据cache:用于存放分布式计算过程中产生的缓存文件models:用于存放模型数据
2.存放算法代码并指定安装方法
在多数分布式场景中,少量的代码往往会经常改动。若每次改动代码都重新构建镜像,会浪费大量的时间。因此,我们可以将这部分代码存放在/TData/code文件夹中。在训练中,每当一个pods被创建时,都将先按照code文件夹中的setup.sh脚本更新环境
3.调用TLaunch创建任务
refs:https://github.com/TARTRL/TMARL/blob/master/docs/tlaunch/README.md
4.管理任务
当任务创建完成后,我们可以调用kubectl来查看任务状态及日志,其中常用的几条指令包括:
- 查看正在运行中的任务:
kubectl get lpjobs - 查看正在运行中的pods:
kubectl get pods - 查看pod节点日志:
kubectl logs ${pod name} - 查看pod节点详细信息:
kubectl describe pods ${pod name} - 删除任务:
kubectl delete lpjobs ${lpjob name}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tlaunch-0.0.1.tar.gz.
File metadata
- Download URL: tlaunch-0.0.1.tar.gz
- Upload date:
- Size: 42.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0fd0f7834792eae245a8165a83abf65f712056d298e86baba2ce68ca8df28da2
|
|
| MD5 |
2af7a466b5b375b7e0e7e78b95d79b8f
|
|
| BLAKE2b-256 |
f3a105762681b90528f1e175a68c32f374675d6e5378e2762225d12edc5bb2cf
|
File details
Details for the file tlaunch-0.0.1-py3-none-any.whl.
File metadata
- Download URL: tlaunch-0.0.1-py3-none-any.whl
- Upload date:
- Size: 68.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fd0a95576edf3b7e2d734c3e733d4e85fb8c844d8c1f4f31c6328faaee697e5
|
|
| MD5 |
fa53e8c85d79215f24e171e26f72974c
|
|
| BLAKE2b-256 |
87526fabf5d3885c52fe0fde63780653d8a8e4b6b2fb26a23cc582acbe417eca
|