NeMo-Aligner - a toolkit for model alignment
Project description
NVIDIA NeMo-Aligner
Latest News
- We released a beta version of accelerated generation support in the RLHF pipeline. This is still very much work in process but adds significant speedup to the RLHF training. For more details see Accelerated-RLHF and the special Accelerated-RLHF-Release.
- NeMo-Aligner Paper is now out on arxiv!
Introduction
NeMo-Aligner is a scalable toolkit for efficient model alignment. The toolkit has support for state of the art model alignment algorithms such as SteerLM, DPO and Reinforcement Learning from Human Feedback (RLHF). These algorithms enable users to align language models to be more safe, harmless and helpful. Users can do end-to-end model alignment on a wide range of model sizes and take advantage of all the parallelism techniques to ensure their model alignment is done in a performant and resource efficient manner. For more technical details, please refer to our paper.
NeMo-Aligner toolkit is built using the NeMo Toolkit which allows for scaling training up to 1000s of GPUs using tensor, data and pipeline parallelism for all components of alignment. All of our checkpoints are cross compatible with the NeMo ecosystem; allowing for inference deployment and further customization.
The toolkit is currently in it's early stages, and we are committed to improving the toolkit to make it easier for developers to pick and choose different alignment algorithms to build safe, helpful and reliable models.
Key features
- SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF.
- Try NV-Llama2-70B-SteerLM-Chat model aligned with NeMo Aligner on NVIDIA AI Foundation for free (no registration required).
- Corresponding reward model Llama2-13B-SteerLM-RM
- Learn more at our SteerLM and HelpSteer papers.
- Supervised Fine Tuning
- Reward Model Training
- Reinforcement Learning from Human Feedback using the PPO Algorithm
- Try NV-Llama2-70B-RLHF model aligned with NeMo Aligner on NVIDIA AI Foundation for free (no registration required).
- Corresponding reward model NV-Llama2-13B-RLHF-RM
- Direct Preference Optimization as described in Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Self-Play Fine-Tuning (SPIN) as described in Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Learn More
Latest Release
For the latest stable release please see the releases page. All releases come with a pre-built container. Changes within each release will be documented in CHANGELOG.
Installing your own environment
Requirements
NeMo-Aligner has the same requirements as the NeMo Toolkit Requirements with the addition of PyTriton.
Installation
Please follow the same steps as the NeMo Toolkit Installation Guide but run the following after installing NeMo
pip install nemo-aligner
or if you prefer to install the latest commit
pip install .
Docker Containers
We provide an official NeMo-Aligner Dockerfile which is based on stable, tested versions of NeMo, Megatron-LM, and TransformerEngine. The goal of this Dockerfile is stability, so it may not track the very latest versions of those 3 packages. You can access our Dockerfile here
Alternatively, you can build the NeMo Dockerfile here NeMo Dockerfile and add RUN pip install nemo-aligner
at the end.
Future work
- Add Rejection Sampling support
- We will continue improving the stability of the PPO learning phase.
- Improve the performance of RLHF
Contributing
We welcome community contributions! Please refer to CONTRIBUTING.md for guidelines.
Citing NeMo-Aligner
@misc{shen2024nemoaligner,
title={NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment},
author={Gerald Shen and Zhilin Wang and Olivier Delalleau and Jiaqi Zeng and Yi Dong and Daniel Egert and Shengyang Sun and Jimmy Zhang and Sahil Jain and Ali Taghibakhshi and Markel Sanz Ausin and Ashwath Aithal and Oleksii Kuchaiev},
year={2024},
eprint={2405.01481},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
License
This toolkit is licensed under the Apache License, Version 2.0.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nemo_aligner-0.3.1.tar.gz
.
File metadata
- Download URL: nemo_aligner-0.3.1.tar.gz
- Upload date:
- Size: 80.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a585ed05a2515c1ef0f4b30c76cb3da8aaf294a53c323bdbd2399e298780908 |
|
MD5 | 18fa60d5cdf43a09c0f73cc7e085d427 |
|
BLAKE2b-256 | f732c50749344a38bafbae5dc727d9881d6d6ed566662e39d1faa161579e62dd |
File details
Details for the file nemo_aligner-0.3.1-py3-none-any.whl
.
File metadata
- Download URL: nemo_aligner-0.3.1-py3-none-any.whl
- Upload date:
- Size: 110.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 101666f900ccb6165125b0757b492759817a5267229f76e3768e3cb8346e56e3 |
|
MD5 | 8eef21410b6c5e5732b741f65338fa06 |
|
BLAKE2b-256 | 1fa3c339ce6f2d6ebbf62195e4e08123bab1412bc5a4e468404fe5df96d260cf |