Train transformer language models with reinforcement learning.

These details have been verified by PyPI

Maintainers

krasul lewtun lvwerra lysandre qgallouedec

These details have not been verified by PyPI

Project links

Homepage

Project description

TRL - Transformers Reinforcement Learning

A comprehensive library to post-train foundation models

🎉 What's New

🌍 Multi-environment agentic RL: GRPOTrainer now supports per-example environment selection and environment-owned rewards — mix multiple sandboxed task suites in one run and let each environment define its own scoring, with Harbor and OpenEnv.

🎯 KTO is now stable: KTOTrainer graduates to the stable API after a full alignment pass with DPOTrainer.

Overview

TRL is a cutting-edge library designed for post-training foundation models using advanced techniques like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), and Direct Preference Optimization (DPO). Built on top of the 🤗 Transformers ecosystem, TRL supports a variety of model architectures and modalities, and can be scaled-up across various hardware setups.

Highlights

Trainers: Various fine-tuning methods are easily accessible via trainers like SFTTrainer, GRPOTrainer, DPOTrainer, KTOTrainer and more.
Efficient and scalable:
- Leverages 🤗 Accelerate to scale from single GPU to multi-node clusters using methods like DDP and DeepSpeed.
- Full integration with 🤗 PEFT enables training on large models with modest hardware via quantization and LoRA/QLoRA.
- Integrates 🦥 Unsloth for accelerating training using optimized kernels.
Command Line Interface (CLI): A simple interface lets you fine-tune with models without needing to write code.

Installation

Python Package

Install the library using pip:

pip install trl

From source

If you want to use the latest features before an official release, you can install TRL from source:

pip install git+https://github.com/huggingface/trl.git

Repository

If you want to use the examples you can clone the repository with the following command:

git clone https://github.com/huggingface/trl.git

Quick Start

For more flexibility and control over training, TRL provides dedicated trainer classes to post-train language models or PEFT adapters on a custom dataset. Each trainer in TRL is a light wrapper around the 🤗 Transformers trainer and natively supports distributed training methods like DDP, DeepSpeed ZeRO, and FSDP.

`SFTTrainer`

Here is a basic example of how to use the SFTTrainer:

from trl import SFTTrainer
from datasets import load_dataset

dataset = load_dataset("trl-lib/Capybara", split="train")

trainer = SFTTrainer(
    model="Qwen/Qwen2.5-0.5B",
    train_dataset=dataset,
)
trainer.train()

`GRPOTrainer`

GRPOTrainer implements the Group Relative Policy Optimization (GRPO) algorithm that is more memory-efficient than PPO and was used to train Deepseek AI's R1.

from datasets import load_dataset
from trl import GRPOTrainer
from trl.rewards import accuracy_reward

dataset = load_dataset("trl-lib/DeepMath-103K", split="train")

trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    reward_funcs=accuracy_reward,
    train_dataset=dataset,
)
trainer.train()

[!NOTE] For reasoning models, use the reasoning_accuracy_reward() function for better results.

`DPOTrainer`

DPOTrainer implements the popular Direct Preference Optimization (DPO) algorithm that was used to post-train Llama 3 and many other models. Here is a basic example of how to use the DPOTrainer:

from datasets import load_dataset
from trl import DPOTrainer

dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")

trainer = DPOTrainer(
    model="Qwen/Qwen3-0.6B",
    train_dataset=dataset,
)
trainer.train()

`KTOTrainer`

KTOTrainer implements the Kahneman-Tversky Optimization (KTO) algorithm, which aligns models from simple binary (desirable / undesirable) feedback rather than paired preferences. Here is a basic example of how to use the KTOTrainer:

from datasets import load_dataset
from trl import KTOTrainer

dataset = load_dataset("trl-lib/kto-mix-14k", split="train")

trainer = KTOTrainer(
    model="Qwen/Qwen3-0.6B",
    train_dataset=dataset,
)
trainer.train()

`RewardTrainer`

Here is a basic example of how to use the RewardTrainer:

from trl import RewardTrainer
from datasets import load_dataset

dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")

trainer = RewardTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    train_dataset=dataset,
)
trainer.train()

Command Line Interface (CLI)

You can use the TRL Command Line Interface (CLI) to quickly get started with post-training methods like Supervised Fine-Tuning (SFT) or Direct Preference Optimization (DPO):

SFT:

trl sft --model_name_or_path Qwen/Qwen2.5-0.5B \
    --dataset_name trl-lib/Capybara \
    --output_dir Qwen2.5-0.5B-SFT

DPO:

trl dpo --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
    --dataset_name argilla/Capybara-Preferences \
    --output_dir Qwen2.5-0.5B-DPO

KTO:

trl kto --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
    --dataset_name trl-lib/kto-mix-14k \
    --output_dir Qwen2.5-0.5B-KTO

Read more about CLI in the relevant documentation section or use --help for more details.

Development

If you want to contribute to trl or customize it to your needs make sure to read the contribution guide and make sure you make a dev install:

git clone https://github.com/huggingface/trl.git
cd trl/
pip install -e .[dev]

Experimental

A minimal incubation area is available under trl.experimental for unstable / fast-evolving features. Anything there may change or be removed in any release without notice.

Example:

from trl.experimental.new_trainer import NewTrainer

Citation

@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}

License

This repository's source code is available under the Apache-2.0 License.

Project details

These details have been verified by PyPI

Maintainers

krasul lewtun lvwerra lysandre qgallouedec

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.9.2

Jul 28, 2026

1.9.1

Jul 26, 2026

1.9.0

Jul 21, 2026

1.8.0

Jul 9, 2026

1.7.1

Jul 4, 2026

1.7.0

Jun 25, 2026

1.6.0

Jun 11, 2026

1.5.1

May 27, 2026

1.5.0

May 25, 2026

1.4.0

May 8, 2026

1.3.0

Apr 26, 2026

1.2.0

Apr 17, 2026

1.1.0

Apr 12, 2026

1.0.0

Mar 30, 2026

1.0.0rc1 pre-release

Mar 20, 2026

0.29.1

Mar 20, 2026

0.29.0

Feb 25, 2026

0.28.0

Feb 10, 2026

0.27.2

Feb 3, 2026

0.27.1

Jan 24, 2026

0.27.0

Jan 16, 2026

0.26.2

Dec 18, 2025

0.26.1

Dec 12, 2025

0.26.0

Dec 9, 2025

0.25.1

Nov 12, 2025

0.25.0

Nov 5, 2025

0.24.0

Oct 16, 2025

0.23.1

Oct 2, 2025

0.23.0

Sep 10, 2025

0.22.2

Sep 3, 2025

0.22.1

Aug 29, 2025

0.22.0

Aug 29, 2025

0.21.0

Aug 5, 2025

0.20.0

Jul 29, 2025

0.19.1

Jul 8, 2025

0.19.0

Jun 20, 2025

0.18.2

Jun 15, 2025

0.18.1

May 29, 2025

0.18.0

May 28, 2025

0.17.0

Apr 24, 2025

0.16.1

Apr 4, 2025

0.16.0

Mar 22, 2025

0.15.2

Feb 25, 2025

0.15.1

Feb 18, 2025

0.15.0

Feb 13, 2025

0.14.0

Jan 29, 2025

0.13.0

Dec 16, 2024

0.12.2

Dec 6, 2024

0.12.1

Nov 14, 2024

0.12.0

Nov 1, 2024

0.11.4

Oct 15, 2024

0.11.3

Oct 10, 2024

0.11.2

Oct 7, 2024

0.11.1

Sep 24, 2024

0.11.0

Sep 19, 2024

0.10.1

Aug 29, 2024

0.9.6

Jul 8, 2024

0.9.4

Jun 6, 2024

0.9.3

Jun 5, 2024

0.9.2

Jun 5, 2024

0.8.6

Apr 22, 2024

0.8.5

Apr 18, 2024

0.8.4

Apr 17, 2024

0.8.3

Apr 12, 2024

0.8.2

Apr 11, 2024

0.8.1

Mar 20, 2024

0.8.0

Mar 19, 2024

0.7.11

Feb 16, 2024

0.7.10

Jan 19, 2024

0.7.9

Jan 9, 2024

0.7.8

Jan 9, 2024

0.7.7

Dec 26, 2023

0.7.6

Dec 22, 2023

0.7.5

Dec 22, 2023

0.7.4

Nov 8, 2023

0.7.3

Nov 8, 2023

0.7.2

Oct 12, 2023

0.7.1

Aug 30, 2023

0.7.0

Aug 30, 2023

0.6.0

Aug 24, 2023

0.5.0

Aug 2, 2023

0.4.7

Jul 13, 2023

0.4.6

Jun 23, 2023

0.4.5

Jun 23, 2023

0.4.4

Jun 8, 2023

0.4.3

Jun 8, 2023

0.4.2

Jun 7, 2023

0.4.1

Mar 17, 2023

0.4.0

Mar 9, 2023

0.3.1

Mar 2, 2023

0.3.0

Mar 1, 2023

0.2.1

Jan 25, 2023

0.2.0

Jan 25, 2023

0.1.0

May 15, 2022

0.0.3

Feb 28, 2021

0.0.2

Jul 17, 2020

0.0.1

Mar 30, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trl-1.9.2.tar.gz (740.7 kB view details)

Uploaded Jul 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trl-1.9.2-py3-none-any.whl (889.0 kB view details)

Uploaded Jul 28, 2026 Python 3

File details

Details for the file trl-1.9.2.tar.gz.

File metadata

Download URL: trl-1.9.2.tar.gz
Upload date: Jul 28, 2026
Size: 740.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/7.0.0 CPython/3.14.6

File hashes

Hashes for trl-1.9.2.tar.gz
Algorithm	Hash digest
SHA256	`8107d5c6d45478205aead0f211e6bbc2f673388421972de635fe40ae1d5f5e61`
MD5	`2fe4c8a62dcf14eb8a4961fb606c9760`
BLAKE2b-256	`ac30d1da1df32ebbf4f7723a8bf83fc15b043fbd0cc3648cb3d1c12be174ab9b`

See more details on using hashes here.

File details

Details for the file trl-1.9.2-py3-none-any.whl.

File metadata

Download URL: trl-1.9.2-py3-none-any.whl
Upload date: Jul 28, 2026
Size: 889.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/7.0.0 CPython/3.14.6

File hashes

Hashes for trl-1.9.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`27847cd9b429af83365b311bb9d93e24bc8cdcc80a54973a6c6cb18a084bce75`
MD5	`79afc20e74cfaf37eaeceef996be0519`
BLAKE2b-256	`9b80edd509e740009b58d966b82b25183080c21f30812f2097263d4b6ef3bbb0`

See more details on using hashes here.

trl 1.9.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TRL - Transformers Reinforcement Learning

A comprehensive library to post-train foundation models

🎉 What's New

Overview

Highlights

Installation

Python Package

From source

Repository

Quick Start

SFTTrainer

GRPOTrainer

DPOTrainer

KTOTrainer

RewardTrainer

Command Line Interface (CLI)

Development

Experimental

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`SFTTrainer`

`GRPOTrainer`

`DPOTrainer`

`KTOTrainer`

`RewardTrainer`