Skip to main content

一种基于神经网络和启发式策略的深度学习模型分布式训练切分(3D parallelism)快速策略搜索算法

Project description

APSS(for Training): Automatically Distributed Deep Learning Parallelism Strategies Search by Self Play

APSS 是一种基于神经网络和启发式策略的深度学习模型分布式训练切分(3D parallelism)快速策略搜索算法,它结合启发式策略和训练集群环境初步生成候选策略,然后通过深度管道策略网络(DPSN)为每个候选策略提供详细的pipeline划分,采用自我对弈的对比强化学习(CRLSP)进行离线训练,无需实际数据收集和后续应用中的微调。此仓库我们使用Mindspore进行实现。


Context

Installation

Requirements:

  • Python >= 3.7
  • Mindspore >= 2.1.1

Method 1: With pip

pip install apss

Method 2: From source

git clone https://github.com/Cheny1m/APSS
cd APSS
pip install -e .

Usage and Examples

一步执行训练

python -m apss.training.apss_run --graph_size 8 --num_split 3 --rebuild_data
  • graph_size , num_split 分别代表了问题的层数大小和需要执行pipeline划分的数量,两个命令行参数共同描述了所训练问题的大小,可根据需求动态调整。
  • rebuild_data 表示是否在执行训练前,从Data Synthesizer中生成训练数据,默认建议开启。如果需要从.ckpt中接续训练或无需改变之前生成的训练数据直接禁用--rebuild_data参数即可。训练数据可在/data目录下找到。
  • 已经完成过执行训练后,.ckpt保存在/output文件夹下,日志保存在/log文件夹下,可以通过tensorboard_logger在浏览器中实时查看训练过程及其数据。

How It Works

The pipeline of APSS.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apss-0.3.0.tar.gz (35.2 kB view details)

Uploaded Source

Built Distribution

apss-0.3.0-py3-none-any.whl (40.6 kB view details)

Uploaded Python 3

File details

Details for the file apss-0.3.0.tar.gz.

File metadata

  • Download URL: apss-0.3.0.tar.gz
  • Upload date:
  • Size: 35.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for apss-0.3.0.tar.gz
Algorithm Hash digest
SHA256 9cdfb57dbb23cdf9b6a7b83b2b6785fadfab19dbba9f7448b01d83536519ff85
MD5 e086da337f865140903dde57ca623c60
BLAKE2b-256 fe8a106eae04da1f76cfbcde79ca918670462989028694b0742f4c8395c455fd

See more details on using hashes here.

File details

Details for the file apss-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: apss-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 40.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for apss-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b7b702f691ee0f854c8f85e9695416240c1f1017acfbed4291cd80e131105c3
MD5 1a7a35a8311a090e1c9f77bd77d5623b
BLAKE2b-256 02f7737de0748a13c348f2959e01df14c8a9ae4327f580383c6006b70a7ad8d0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page