An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

These details have not been verified by PyPI

Project links

Homepage

Project description

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

Used by Amazon Web Services

This project is a clean fork of the original veRL project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.

EasyR1 is efficient and scalable due to the design of HybirdEngine and the latest release of vLLM's SPMD mode.

Features

Supported models
- Llama3/Qwen2/Qwen2.5/Qwen3 language models
- Qwen2/Qwen2.5-VL vision language models
- DeepSeek-R1 distill models
Supported algorithms
- GRPO
- DAPO
- Reinforce++
- ReMax
- RLOO
Supported datasets
- Any text, vision-text dataset in a specific format
Supported tricks
- Padding-free training
- Resuming from the latest/best checkpoint
- Wandb & SwanLab & Mlflow & Tensorboard tracking

Requirements

Software Requirements

Python 3.9+
transformers>=4.54.0
flash-attn>=2.4.3
vllm>=0.8.3

We provide a Dockerfile to easily build environments.

We recommend using the pre-built docker image in EasyR1.

docker pull hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0
docker run -it --ipc=host --gpus=all hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0

If your environment does not support Docker, you can consider using Apptainer:

apptainer pull easyr1.sif docker://hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0
apptainer shell --nv --cleanenv --bind /mnt/your_dir:/mnt/your_dir easyr1.sif

Use USE_MODELSCOPE_HUB=1 to download models from the ModelScope hub.

Hardware Requirements

* estimated

Method	Bits	1.5B	3B	7B	32B	72B
GRPO Full Fine-Tuning	AMP	2*24GB	4*40GB	8*40GB	16*80GB	32*80GB
GRPO Full Fine-Tuning	BF16	1*24GB	1*40GB	4*40GB	8*80GB	16*80GB

[!NOTE] Use worker.actor.fsdp.torch_dtype=bf16 and worker.actor.optim.strategy=adamw_bf16 to enable bf16 training.

We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates.

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

Installation

git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .

GRPO Training

bash examples/qwen2_5_vl_7b_geo3k_grpo.sh

Merge Checkpoint in Hugging Face Format

python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actor

[!TIP] If you encounter issues with connecting to Hugging Face, consider using export HF_ENDPOINT=https://hf-mirror.com.

If you want to use SwanLab logger, consider using bash examples/qwen2_5_vl_7b_geo3k_swanlab.sh.

Custom Dataset

Please refer to the example datasets to prepare your own dataset.

Text dataset: https://huggingface.co/datasets/hiyouga/math12k
Image-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k
Multi-image-text dataset: https://huggingface.co/datasets/hiyouga/journeybench-multi-image-vqa
Text-image mixed dataset: https://huggingface.co/datasets/hiyouga/rl-mixed-dataset

How to Understand GRPO in EasyR1

To learn about the GRPO algorithm, you can refer to Hugging Face's blog.

How to Run 70B+ Model in Multi-node Environment

Start the Ray head node.

ray start --head --port=6379 --dashboard-host=0.0.0.0

Start the Ray worker node and connect to the head node.

ray start --address=<head_node_ip>:6379

Check the Ray resource pool.

ray status

Run training script on the Ray head node only.

bash examples/qwen2_5_vl_7b_geo3k_grpo.sh

See the veRL's official doc for more details about multi-node training and Ray debugger.

Other Baselines

We also reproduced the following two baselines of the R1-V project.

CLEVR-70k-Counting: Train the Qwen2.5-VL-3B-Instruct model on counting problem.
GeoQA-8k: Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem.

Performance Baselines

See baselines.md.

Awesome Work using EasyR1

MMR1: Advancing the Frontiers of Multimodal Reasoning.
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models.
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement.
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse.
Temporal-R1: Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward.
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation.
GUI-R1: A Generalist R1-Style Vision-Language Action Model For GUI Agents.
R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning.
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning.
MM-UPT: Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO.
RL-with-Cold-Start: Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start.
ViGoRL: Grounded Reinforcement Learning for Visual Reasoning.
Revisual-R1: Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning.
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward.
Vision-Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning.
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use.
Long-RL: Scaling RL to Long Sequences.

TODO

Support LoRA (high priority).
Support ulysses parallelism for VLMs (middle priority).
Support more VLM architectures.

[!NOTE] We will not provide scripts for supervised fine-tuning and inference in this project. If you have such requirements, we recommend using LLaMA-Factory.

Known bugs

These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.

Vision language models are not compatible with ulysses parallelism yet.

Discussion Group

👋 Join our WeChat group.

FAQs

ValueError: Image features and image tokens do not match: tokens: 8192, features 9800

Increase the data.max_prompt_length or reduce the data.max_pixels.

RuntimeError: CUDA Error: out of memory at /workspace/csrc/cumem_allocator.cpp:62

Reduce the worker.rollout.gpu_memory_utilization and enable worker.actor.offload.offload_params.

RuntimeError: 0 active drivers ([]). There should only be one.

Uninstall deepspeed from the current python environment.

Citation

Core contributors: Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang and Yuwen Xiong

We also thank Guangming Sheng and Chi Zhang for helpful discussions.

@misc{zheng2025easyr1,
  title        = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework},
  author       = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang, Yuwen Xiong},
  howpublished = {\url{https://github.com/hiyouga/EasyR1}},
  year         = {2025}
}

We recommend to also cite the original work.

@article{sheng2024hybridflow,
  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},
  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2409.19256}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.3.2.dev2 pre-release

Nov 29, 2025

This version

0.3.2.dev1 pre-release

Nov 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lazyllm_verl-0.3.2.dev1-py3-none-any.whl (129.9 kB view details)

Uploaded Nov 27, 2025 Python 3

File details

Details for the file lazyllm_verl-0.3.2.dev1-py3-none-any.whl.

File metadata

Download URL: lazyllm_verl-0.3.2.dev1-py3-none-any.whl
Upload date: Nov 27, 2025
Size: 129.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for lazyllm_verl-0.3.2.dev1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a044b0ab2dff206ca4c8d2507f37ad0f8dd7a1c643ba2a4241c889b294aa1668`
MD5	`ef4f65152b5ee03b726c87b987a55d0b`
BLAKE2b-256	`1cefc722b0f9c2bca4a70de689c80ba2d41805fb254cba5d2e084adaa2fb007d`

See more details on using hashes here.

lazyllm-verl 0.3.2.dev1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

Used by Amazon Web Services

Features

Requirements

Software Requirements

Hardware Requirements

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

Installation

GRPO Training

Merge Checkpoint in Hugging Face Format

Custom Dataset

How to Understand GRPO in EasyR1

How to Run 70B+ Model in Multi-node Environment

Other Baselines

Performance Baselines

Awesome Work using EasyR1

TODO

Known bugs

Discussion Group

FAQs

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes