Alpa automatically parallelizes large tensor computation graphs and runs them on a distributed cluster.
Project description
Alpa
Alpa is a system for training and serving large-scale neural networks.
Scaling neural networks to hundreds of billions of parameters has enabled dramatic breakthroughs such as GPT-3, but training and serving these large-scale neural networks require complicated distributed system techniques. Alpa aims to automate large-scale distributed training and serving with just a few lines of code.
The key features of Alpa include:
💻 Automatic Parallelization. Alpa automatically parallelizes users' single-device code on distributed clusters with data, operator, and pipeline parallelism.
🚀 Excellent Performance. Alpa achieves linear scaling on training models with billions of parameters on distributed clusters.
✨ Tight Integration with Machine Learning Ecosystem. Alpa is backed by open-source, high-performance, and production-ready libraries such as Jax, XLA, and Ray.
👉 Try Alpa-served OPT-175B!
Alpa provides a free, unlimited OPT-175B text generation service. Try our service at https://opt.alpa.ai/ and share your prompting results!
Join Alpa slack and let us know any new features you want!
Quick Start
Use Alpa's decorator @parallelize
to scale your single-device training code to distributed clusters.
import alpa
# Parallelize the training step in Jax by simply using a decorator
@alpa.parallelize
def train_step(model_state, batch):
def loss_func(params):
out = model_state.forward(params, batch["x"])
return jnp.mean((out - batch["y"]) ** 2)
grads = grad(loss_func)(model_state.params)
new_model_state = model_state.apply_gradient(grads)
return new_model_state
# The training loop now automatically runs on your designated cluster
model_state = create_train_state()
for batch in data_loader:
model_state = train_step(model_state, batch)
Check out the Alpa Documentation site for installation instructions, tutorials, examples, and more.
Learning more
- Alpa OSDI 2022 paper
- Google AI blog
- Alpa talk slides
- ICML 2022 Big Model Tutorial slides
- ICML 2022 Big Model Tutorial video recording
- Prof. Ion Stoica introduces the Alpa system
Getting Involved
- Please read the contributor guide if you are interested in contributing to Alpa.
- Connect to Alpa contributors via the Alpa slack.
License
Alpa is licensed under the Apache-2.0 license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for alpa-0.1.6-cp39-cp39-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7392ef1fe34ad72050674928086debadc17b45d45dda4c2c7049b2e049adcc59 |
|
MD5 | 0f728a2f7d0bd52b9a8324e0bfe86cf2 |
|
BLAKE2b-256 | 10150e6eafa396f5dc2ee0dc5a504a833bc83f1ebdbde08f0a0d381de1e8d64e |
Hashes for alpa-0.1.6-cp38-cp38-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6c05f718bc51f05202283dc5c50c855c33b0af116f1a5c840889d35dc64b6fe |
|
MD5 | 66593cd7206afa3939917865f2fba119 |
|
BLAKE2b-256 | a6299080422f8f312bf109bef89f933439ea14f5b81aeb00fba68d92ae3c3e79 |
Hashes for alpa-0.1.6-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3918c653c96b673c0413b2dc8d0eece2e552b0b009bba4c826913a3bb0d20741 |
|
MD5 | 170d4ca5a36ef4fef0d52f7f0e46b12a |
|
BLAKE2b-256 | 92ff265787c51fda1a853a5e40df11ca1b9fcfff0c1bffed3fa8300bde0815f0 |