Transformers at zeta scales
Project description
Zeta - A Transgalactic Library for Scalable Transformations
Zeta is a PyTorch-powered library, forged in the heart of the Halo array, that empowers researchers and developers to scale up Transformers efficiently and effectively. It leverages seminal research advancements to enhance the generality, capability, and stability of scaling Transformers while optimizing training efficiency.
- Stability - DeepNet: Scaling Transformers beyond 1,000 layers
- Generality - Foundation Transformers (Magneto): Pioneering a path towards universal modeling across diverse tasks and modalities (including language, vision, speech, and multimodal)
- Capability - The Length-Extrapolatable Transformer
- Efficiency - X-MoE: Scalable & finetunable sparse Mixture-of-Experts (MoE)
News
Installation
To install:
pip install zeta
To get hands-on and develop it locally:
git clone https://github.com/kyegomez/zeta.git
cd zeta
pip install -e .
Initiating Your Journey
Creating a model empowered with the aforementioned breakthrough research features is a breeze. Here's how to quickly materialize a BERT-like encoder:
>>> from zeta import EncoderConfig
>>> from zeta import Encoder
>>> config = EncoderConfig(vocab_size=64000)
>>> model = Encoder(config)
>>> print(model)
Additionally, we support the Decoder
and EncoderDecoder
architectures:
# To create a decoder model
>>> from zeta import DecoderConfig
>>> from zeta import Decoder
>>> config = DecoderConfig(vocab_size=64000)
>>> decoder = Decoder(config)
>>> print(decoder)
# To create an encoder-decoder model
>>> from zeta import EncoderDecoderConfig
>>> from zeta import EncoderDecoder
>>> config = EncoderDecoderConfig(vocab_size=64000)
>>> encdec = EncoderDecoder(config)
>>> print(encdec)
Key Features
Most of the transformative features mentioned below can be enabled by simply setting the corresponding parameters in the config
:
>>> from zeta import EncoderConfig
>>> from zeta import Encoder
>>> config = EncoderConfig(vocab_size=64000, deepnorm=True, multiway=True)
>>> model = Encoder(config)
>>> print(model)
For a complete overview of our key features, refer to our Feature Guide.
Examples
Discover how to wield Zeta in a multitude of scenarios/tasks, including but not limited to:
-
Language
-
Vision
- ViT/BEiT [In progress]
-
Speech
-
Multimodal
We are working tirelessly to expand the collection of examples spanning various tasks (e.g., vision pretraining, speech recognition) and various deep learning frameworks (e.g., DeepSpeed, Megatron-LM). Your comments, suggestions, or contributions are welcome!
Results
Check out our Results Page to witness Zeta's exceptional performance in Stability Evaluations and Scaling-up Experiments.
Acknowledgments
Zeta is a masterpiece inspired by elements of FairSeq and UniLM.
Citations
If our work here in Zeta has aided you in your journey, please consider acknowledging our efforts in your work. You can find relevant citation details in our Citations Document.
Contributing
We're always thrilled to welcome new ideas and improvements from the community. Please check our Contributor's Guide for more details about contributing.
- Create an modular omni-universal Attention class with flash multihead attention or regular mh or dilated attention -> then integrate into Decoder/ DecoderConfig
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for zetascale-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9032d5600b725662d4c07572f60c37626385f93949f4650fdeb4b595896ed23 |
|
MD5 | 85ca50988cd0fee4a54537ea1257f1cc |
|
BLAKE2b-256 | c508d0773ae77f1d4aa1f2fb58c0d23826718a69479e96cf380543f8e09b0d40 |