Skip to main content

tlaunch

Project description

TLaunch_dev

Introduction

安装

virtualenv tmarl_env -p python3
# under tmarl_env, go into this repo and execute:
pip install -r requirements_dev.txt
pip install .

Quick Start

1.使用SSH登录TPod

TPod是一款面向分布式场景,为TLaunch准备的用户及资源管理工具。当管理员使用TPod创建用户后,可以自定义的为其指定分配的系统资源(包括CPU、GPU、Memory、Storage),并为该用户创建一个已经预装好TLaunch框架的TPod开发机,用户可以直接通过SSH登录该机器访问集群,以快速进入开发流程。

TPod的存储结构

TPod中,我们会在/TData目录下创建你的个人文件夹,该文件夹会通过挂载的文件系统与远端同步。/TData内会预先创建以下内容:

  • code:用于存放训练代码
    • setup.sh:用于指定训练环境及代码的安装方法
  • data:用于存放训练所需的数据
  • cache:用于存放分布式计算过程中产生的缓存文件
  • models:用于存放模型数据

2.存放算法代码并指定安装方法

在多数分布式场景中,少量的代码往往会经常改动。若每次改动代码都重新构建镜像,会浪费大量的时间。因此,我们可以将这部分代码存放在/TData/code文件夹中。在训练中,每当一个pods被创建时,都将先按照code文件夹中的setup.sh脚本更新环境

3.调用TLaunch创建任务

refs:https://github.com/TARTRL/TMARL/blob/master/docs/tlaunch/README.md

4.管理任务

当任务创建完成后,我们可以调用kubectl来查看任务状态及日志,其中常用的几条指令包括:

  • 查看正在运行中的任务:kubectl get lpjobs
  • 查看正在运行中的pods:kubectl get pods
  • 查看pod节点日志:kubectl logs ${pod name}
  • 查看pod节点详细信息:kubectl describe pods ${pod name}
  • 删除任务:kubectl delete lpjobs ${lpjob name}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tlaunch-0.0.1.tar.gz (42.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tlaunch-0.0.1-py3-none-any.whl (68.2 kB view details)

Uploaded Python 3

File details

Details for the file tlaunch-0.0.1.tar.gz.

File metadata

  • Download URL: tlaunch-0.0.1.tar.gz
  • Upload date:
  • Size: 42.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for tlaunch-0.0.1.tar.gz
Algorithm Hash digest
SHA256 0fd0f7834792eae245a8165a83abf65f712056d298e86baba2ce68ca8df28da2
MD5 2af7a466b5b375b7e0e7e78b95d79b8f
BLAKE2b-256 f3a105762681b90528f1e175a68c32f374675d6e5378e2762225d12edc5bb2cf

See more details on using hashes here.

File details

Details for the file tlaunch-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: tlaunch-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 68.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for tlaunch-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6fd0a95576edf3b7e2d734c3e733d4e85fb8c844d8c1f4f31c6328faaee697e5
MD5 fa53e8c85d79215f24e171e26f72974c
BLAKE2b-256 87526fabf5d3885c52fe0fde63780653d8a8e4b6b2fb26a23cc582acbe417eca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page