Skip to main content

davincirunsdk

Project description

davincirun

version number: 0.0.1 author: jizhongsheng

Overview

davincirun

华为多机多卡SDK,这里我们进行了二方库封装,并封装了一系列程序

davincirun.py是启动脚本,华为官方用法为

$davincirun python train.py

脚本做了如下工作:

  • 初始化rank table(hccl json)

    • 根据k8s提供的/user/config/jobstart_hccl.json生成/home/ma-user/rank_table/jobstart_hccl.json

    • 设置环境变量RANK_TABLE_FILE=/home/ma-user/rank_table/jobstart_hccl.json

  • 启动多进程训练

    • 根据上述生成的rank table,在/home/ma-user/下创建workspace目录,workspace目录下device{id}是各个device对应进程的实际工作目录
  • 监控多进程状态

此SDK拆分上述工作为各个模块,其中初始化rank_table为全量入口调用,启动多进程训练并监控由数据分析师自主调用,实现分布式训练能力

Installation / Usage

To install use pip:

$ pip install davincirun

Or clone the repo:

$ git clone https://github.com/jizhongsheng/davincirun.git
$ python setup.py install

Contributing

TBD

Example

TBD

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

davincirunsdk-0.0.1.tar.gz (26.0 kB view hashes)

Uploaded Source

Built Distribution

davincirunsdk-0.0.1-py2.py3-none-any.whl (30.7 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page