Skip to main content

Efficient Automatic Training System for Super-Large Models

Project description

EasyDist

EasyDist is an automated parallelization system and infrastructure designed for multiple ecosystems, offering the following key features:

  • Usability. With EasyDist, parallelizing your training or inference code to a larger scale becomes effortless with just a single line of change.

  • Ecological Compatibility. EasyDist serves as a centralized source of truth for SPMD rules at the operator-level for various machine learning frameworks. Currently, EasyDist currently supports PyTorch, Jax natively, and the TVM Tensor Expression operator for SPMD rules.

  • Infrastructure. EasyDist decouples auto-parallel algorithms from specific machine learning frameworks and IRs. This design choice allows for the development and benchmarking of different auto-parallel algorithms in a more flexible manner, leveraging the capabilities and abstractions provided by EasyDist.

One Line of Code for Parallelism

To parallelize your training loop using EasyDist, you can use the easydist_compile decorator. Here's an example of how it can be used with PyTorch:

@easydist_compile()
def train_step(net, optimizer, inputs, labels):

    outputs = net(inputs)
    loss = nn.CrossEntropyLoss()(outputs, labels)
    loss.backward()

    optimizer.step()
    optimizer.zero_grad()

    return loss

This one-line decorator parallelizes the training step. You can find more examples in the ./examples/ directory.

Overview

EasyDist introduces the concept of MetaOp and MetaIR to decouple automatic parallelization methods from specific intermediate representations (IR) and frameworks. Additionally, it presents the ShardCombine Algorithm, which defines operator Single-Program, Multiple-Data (SPMD) sharding rules without requiring manual annotations. The architecture of EasyDist is as follows:



Installation

To install EasyDist, you can use pip and install from PyPI:

# For PyTorch users
pip install pai-easydist[torch]

# For Jax users
pip install pai-easydist[jax] -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

If you prefer to install EasyDist from source, you can clone the GitHub repository and then install it with the appropriate extras:

git clone https://github.com/alibaba/easydist.git && cd easydist

# EasyDist with PyTorch installation
pip install -e '.[torch]'

# EasyDist with Jax installation
pip install -e '.[jax]' -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Contributing

See CONTRIBUTING.md for details.

Contributors

EasyDist is developed by Alibaba Group and NUS HPC-AI Lab. This work is supported by Alibaba Innovative Research(AIR).

License

EasyDist is licensed under the Apache License (Version 2.0). See LICENSE file. This product contains some third-party testcases under other open source licenses. See the NOTICE file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pai-easydist-0.1.0.tar.gz (64.7 kB view details)

Uploaded Source

Built Distribution

pai_easydist-0.1.0-py3-none-any.whl (93.5 kB view details)

Uploaded Python 3

File details

Details for the file pai-easydist-0.1.0.tar.gz.

File metadata

  • Download URL: pai-easydist-0.1.0.tar.gz
  • Upload date:
  • Size: 64.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for pai-easydist-0.1.0.tar.gz
Algorithm Hash digest
SHA256 89a514a6ab8d3099a62c9bfdfe9256bc9308ce0a3b461c2687bfa7e95393cb04
MD5 0a6038450494da1e0a7d02176283f7e0
BLAKE2b-256 cc791275e7d824f05e965ccb6515a2e2538f55b8087c7cddabeb4e65b2cb805b

See more details on using hashes here.

File details

Details for the file pai_easydist-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pai_easydist-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 747585927a82742039be8b30d11c4d36ca1b121f495a448e07668b880d679fa7
MD5 f56dfeb2d880ba56012b3d4c449b25a2
BLAKE2b-256 1dc3582ffbad3a69d27b4f83cca0b7ae647f5932e1dd064c7ff27c47e01b414c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page