Skip to main content

tlaunch

Project description

TLaunch_dev

Introduction

安装

virtualenv tmarl_env -p python3
# under tmarl_env, go into this repo and execute:
pip install -r requirements_dev.txt
pip install .

Quick Start

1.使用SSH登录TPod

TPod是一款面向分布式场景,为TLaunch准备的用户及资源管理工具。当管理员使用TPod创建用户后,可以自定义的为其指定分配的系统资源(包括CPU、GPU、Memory、Storage),并为该用户创建一个已经预装好TLaunch框架的TPod开发机,用户可以直接通过SSH登录该机器访问集群,以快速进入开发流程。

TPod的存储结构

TPod中,我们会在/TData目录下创建你的个人文件夹,该文件夹会通过挂载的文件系统与远端同步。/TData内会预先创建以下内容:

  • code:用于存放训练代码
    • setup.sh:用于指定训练环境及代码的安装方法
  • data:用于存放训练所需的数据
  • cache:用于存放分布式计算过程中产生的缓存文件
  • models:用于存放模型数据

2.存放算法代码并指定安装方法

在多数分布式场景中,少量的代码往往会经常改动。若每次改动代码都重新构建镜像,会浪费大量的时间。因此,我们可以将这部分代码存放在/TData/code文件夹中。在训练中,每当一个pods被创建时,都将先按照code文件夹中的setup.sh脚本更新环境

3.调用TLaunch创建任务

refs:https://github.com/TARTRL/TMARL/blob/master/docs/tlaunch/README.md

4.管理任务

当任务创建完成后,我们可以调用kubectl来查看任务状态及日志,其中常用的几条指令包括:

  • 查看正在运行中的任务:kubectl get lpjobs
  • 查看正在运行中的pods:kubectl get pods
  • 查看pod节点日志:kubectl logs ${pod name}
  • 查看pod节点详细信息:kubectl describe pods ${pod name}
  • 删除任务:kubectl delete lpjobs ${lpjob name}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tlaunch-0.0.2.tar.gz (46.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tlaunch-0.0.2-py3-none-any.whl (72.7 kB view details)

Uploaded Python 3

File details

Details for the file tlaunch-0.0.2.tar.gz.

File metadata

  • Download URL: tlaunch-0.0.2.tar.gz
  • Upload date:
  • Size: 46.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for tlaunch-0.0.2.tar.gz
Algorithm Hash digest
SHA256 f5d4083fb03c438a2ebf6e1685cf8f2b98356ca2f3f5dd0d3289d4fe59c26963
MD5 112b9049e31db3595cf6f67628010c6b
BLAKE2b-256 ca4519943292b3cea35defb1cabc3905219d693bd617151975f417d216c44943

See more details on using hashes here.

File details

Details for the file tlaunch-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: tlaunch-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 72.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for tlaunch-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 defe63864e948ab4d117261d30ca6f616bae90bac9a29db7cfaf85465d44e049
MD5 bb6b02d68045c28f4a17df6085eb30e8
BLAKE2b-256 015a5a49a2e26803d136c925f28c40e8bf5e3e20ea682f7268aa8b891e65c6eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page