oat-llm

Online AlignmenT (OAT) for LLMs.

These details have not been verified by PyPI

Project links

Project description

OAT

Installation | Usage | Examples | Citation

Updates

31/10/2025: We advocate for re-evaluating precision choices in RL training (Precision RL) — demonstrating that FP16 offers superior performance and stability compared to the de facto BF16.
02/10/2025: We add LoRA-RL support and validate its performance as comparable to full fine-tuning RL (super excited to be highlighted by John Schulman).
21/03/2025: We incorporate Dr. GRPO, which fixes the optimization bias in GRPO.
26/01/2025: We support reinforcement learning with verifiable rewards (RLVR) for math reasoning.
20/10/2024: We open source Oat, an online LLM alignment framework developed during a research project on online LLM exploration (sample-efficient alignment).

Introduction

Oat 🌾 is a simple yet efficient framework for running online LLM alignment algorithms. Its key features include:

High Efficiency: Oat implements a distributed Actor-Learner-Oracle architecture, with each component being optimized using state-of-the-art tools:
- Actor: Utilizes vLLM for accelerated online response sampling.
- Learner: Leverages DeepSpeed ZeRO strategies to enhance memory efficiency.
- Oracle: Model-based oracle by Mosec as a remote service, supporting dynamic batching, data parallelism and pipeline parallelism.
Simplified Workflow: Oat simplifies the experimental pipeline of LLM alignment. With an Oracle served online, we can flexibly query it for preference data labeling as well as anytime model evaluation. All you need is to launch experiments and monitor real-time learning curves (e.g., win rate) on wandb (see reproduced results) — no need for manual training, checkpointing and loading for evaluation.
Oracle Simulation: Oat provides a diverse set of oracles to simulate preference/reward/verification feedback.
- Verifiable rewards supported using rule-based functions.
- Lightweight reward models run within the actor's process, enabling quick testing on as few as two GPUs.
- Larger and more capable reward models can be served remotely, harnessing additional compute and memory resources.
- LLM-as-a-judge is supported via querying OpenAI API for model-based pairwise ranking.
Ease of Use: Oat's modular structure allows researchers to easily inherit and modify existing classes, enabling rapid prototyping and experimentation with new algorithms.
Cutting-Edge Algorithms: Oat implements state-of-the-art online algorithms, fostering innovation and fair benchmarking.
- PPO/Dr.GRPO (online RL) for math reasoning.
- Online DPO/SimPO/IPO for online preference learning.
- Online exploration (active alignment) algorithms, including SEA, APL and XPO.

Installation

In a python environment with supported versions (we recommend 3.10), you could install oat via PyPI:

pip install vllm==0.8.4 && pip install -U oat-llm

Or you could also install in "editable" mode for local development:

git clone git@github.com:sail-sg/oat.git
cd oat
pip install vllm==0.8.4 && pip install -e .

Usage

R1-Zero-like training using Dr. GRPO for math reasoning: a single-file self-contained implementation with training script.
Multi-turn SFT: an example training script.
Online preference learning with active exploration: a detailed guide.

Adopters

Research projects that are built (or integrated) with Oat 🌾:

Citation

If you find this codebase useful for your research, please consider citing:

LLM online alignment framework:

@misc{liu2024oat,
  title={OAT: A research-friendly framework for LLM online alignment},
  author={Liu, Zichen and Chen, Changyu and Wan, Xinyi and Du, Chao and Lee, Wee Sun and Lin, Min},
  year={2024}
  howpublished={\url{https://github.com/sail-sg/oat}},
}

Online exploration method:

@article{liu2024sea,
  title={Sample-Efficient Alignment for LLMs},
  author={Liu, Zichen and Chen, Changyu and Du, Chao and Lee, Wee Sun and Lin, Min},
  journal={arXiv preprint arXiv:2411.01493},
  year={2024}
}

License

oat is distributed under the terms of the Apache2 license.

Acknowledgement

We thank the following awesome projects that have contributed to the development of oat:

Disclaimer

This is not an official Sea Limited or Garena Online Private Limited product.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.4

Dec 23, 2025

0.2.3

Oct 31, 2025

0.2.2

Oct 2, 2025

0.2.1.post1

Sep 16, 2025

0.2.1

Aug 24, 2025

0.2.0

Jul 24, 2025

0.1.4

Jul 9, 2025

0.1.3.post2

Jun 11, 2025

0.1.3.post1

Jun 3, 2025

0.1.3

May 19, 2025

0.1.2.post3

May 17, 2025

0.1.2.post2

May 8, 2025

0.1.2.post1

May 7, 2025

0.1.2

May 6, 2025

0.1.1

Apr 20, 2025

0.1.0

Apr 18, 2025

0.0.9

Mar 21, 2025

0.0.7.post2

Mar 20, 2025

0.0.7.post1

Mar 19, 2025

0.0.7

Mar 19, 2025

0.0.6

Jan 26, 2025

0.0.5

Dec 17, 2024

0.0.4

Nov 11, 2024

0.0.3 yanked

Nov 5, 2024

Reason this release was yanked:

change the arg names

0.0.2 yanked

Nov 1, 2024

0.0.1 yanked

Oct 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

oat_llm-0.2.4-py3-none-any.whl (154.5 kB view details)

Uploaded Dec 23, 2025 Python 3

File details

Details for the file oat_llm-0.2.4-py3-none-any.whl.

File metadata

Download URL: oat_llm-0.2.4-py3-none-any.whl
Upload date: Dec 23, 2025
Size: 154.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for oat_llm-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`de2d528bde97de5e08fca2a4b05a0b9354cd601a02bb10be38a569e0be15dd95`
MD5	`604a345430f9d8cc5dcb436f7d02a705`
BLAKE2b-256	`0d9696888428441d985ad22f78e9cda72278f49c767c98c0705dc867d2b3ced2`

See more details on using hashes here.

oat-llm 0.2.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Updates

Introduction

Installation

Usage

Adopters

Citation

License

Acknowledgement

Disclaimer

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes