Skip to main content

NeMo-Aligner - a toolkit for model alignment

Project description

NVIDIA NeMo-Aligner

Introduction

NeMo-Aligner is a scalable toolkit for efficient model alignment. The toolkit has support for state of the art model alignment algorithms such as SteerLM, DPO and Reinforcement Learning from Human Feedback (RLHF). These algorithms enable users to align language models to be more safe, harmless and helpful. Users can do end-to-end model alignment on a wide range of model sizes and take advantage of all the parallelism techniques to ensure their model alignment is done in a performant and resource efficient manner.

NeMo-Aligner toolkit is built using the NeMo Toolkit which allows for scaling training up to 1000s of GPUs using tensor, data and pipeline parallelism for all components of alignment. All of our checkpoints are cross compatible with the NeMo ecosystem; allowing for inference deployment and further customization.

The toolkit is currently in it's early stages, and we are committed to improving the toolkit to make it easier for developers to pick and choose different alignment algorithms to build safe, helpful and reliable models.

Key features

  • SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF. Learn more at our SteerLM and HelpSteer papers. Try it instantly for free on NVIDIA AI Playground
  • Supervised Fine Tuning
  • Reward Model Training
  • Reinforcement Learning from Human Feedback using the PPO Algorithm
  • Direct Preference Optimization as described in paper

Learn More

Latest Release

For the latest stable release please see the releases page. All releases come with a pre-built container. Changes within each release will be documented in CHANGELOG.

Installing your own environment

Requirements

NeMo-Aligner has the same requirements as the NeMo Toolkit Requirements with the addition of PyTriton.

Installation

Please follow the same steps as the NeMo Toolkit Installation Guide but run the following after installing NeMo

pip install nemo-aligner

or if you prefer to install the latest commit

pip install .

Docker Containers

To build your own, refer to the NeMo Dockerfile and add RUN pip install nemo-aligner at the end.

Future work

  • Add Rejection Sampling support
  • We will continue improving the stability of the PPO learning phase.
  • Improve the performance of RLHF

Contributing

We welcome community contributions! Please refer to CONTRIBUTING.md for guidelines.

License

This toolkit is licensed under the Apache License, Version 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nemo_aligner-0.1.0.tar.gz (61.7 kB view details)

Uploaded Source

Built Distribution

nemo_aligner-0.1.0-py3-none-any.whl (89.4 kB view details)

Uploaded Python 3

File details

Details for the file nemo_aligner-0.1.0.tar.gz.

File metadata

  • Download URL: nemo_aligner-0.1.0.tar.gz
  • Upload date:
  • Size: 61.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for nemo_aligner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f867b170362098446ad666a07d03c566cbb6555f46c3bd7580fbc5cb114c1881
MD5 317ac260e889cfcae93e66fdfdb5fdec
BLAKE2b-256 42a808de824aa9f791dda18411334ea272e0d9b5656439fc3d595ecc65ea004f

See more details on using hashes here.

File details

Details for the file nemo_aligner-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nemo_aligner-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 89.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for nemo_aligner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 719bf53bea44372485b6fff069948e4da2c4274e51ba2613789019a865ecfa85
MD5 95d5a7a8327d1154eb7f737598f87c90
BLAKE2b-256 bba78ca038b3933a265911d7a9f939951ce2b987aa886f5b25728e0c1c0226bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page