An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Project description
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework
Used by Amazon Web Services
This project is a clean fork of the original veRL project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.
EasyR1 is efficient and scalable due to the design of HybirdEngine and the latest release of vLLM's SPMD mode.
Features
-
Supported models
- Llama3/Qwen2/Qwen2.5/Qwen3 language models
- Qwen2/Qwen2.5-VL vision language models
- DeepSeek-R1 distill models
-
Supported algorithms
- GRPO
- DAPO
- Reinforce++
- ReMax
- RLOO
-
Supported datasets
- Any text, vision-text dataset in a specific format
-
Supported tricks
- Padding-free training
- Resuming from the latest/best checkpoint
- Wandb & SwanLab & Mlflow & Tensorboard tracking
Requirements
Software Requirements
- Python 3.9+
- transformers>=4.54.0
- flash-attn>=2.4.3
- vllm>=0.8.3
We provide a Dockerfile to easily build environments.
We recommend using the pre-built docker image in EasyR1.
docker pull hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0
docker run -it --ipc=host --gpus=all hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0
If your environment does not support Docker, you can consider using Apptainer:
apptainer pull easyr1.sif docker://hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0
apptainer shell --nv --cleanenv --bind /mnt/your_dir:/mnt/your_dir easyr1.sif
Use USE_MODELSCOPE_HUB=1 to download models from the ModelScope hub.
Hardware Requirements
* estimated
| Method | Bits | 1.5B | 3B | 7B | 32B | 72B |
|---|---|---|---|---|---|---|
| GRPO Full Fine-Tuning | AMP | 2*24GB | 4*40GB | 8*40GB | 16*80GB | 32*80GB |
| GRPO Full Fine-Tuning | BF16 | 1*24GB | 1*40GB | 4*40GB | 8*80GB | 16*80GB |
[!NOTE] Use
worker.actor.fsdp.torch_dtype=bf16andworker.actor.optim.strategy=adamw_bf16to enable bf16 training.We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates.
Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps
Installation
git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .
GRPO Training
bash examples/qwen2_5_vl_7b_geo3k_grpo.sh
Merge Checkpoint in Hugging Face Format
python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actor
[!TIP] If you encounter issues with connecting to Hugging Face, consider using
export HF_ENDPOINT=https://hf-mirror.com.If you want to use SwanLab logger, consider using
bash examples/qwen2_5_vl_7b_geo3k_swanlab.sh.
Custom Dataset
Please refer to the example datasets to prepare your own dataset.
- Text dataset: https://huggingface.co/datasets/hiyouga/math12k
- Image-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k
- Multi-image-text dataset: https://huggingface.co/datasets/hiyouga/journeybench-multi-image-vqa
- Text-image mixed dataset: https://huggingface.co/datasets/hiyouga/rl-mixed-dataset
How to Understand GRPO in EasyR1
- To learn about the GRPO algorithm, you can refer to Hugging Face's blog.
How to Run 70B+ Model in Multi-node Environment
- Start the Ray head node.
ray start --head --port=6379 --dashboard-host=0.0.0.0
- Start the Ray worker node and connect to the head node.
ray start --address=<head_node_ip>:6379
- Check the Ray resource pool.
ray status
- Run training script on the Ray head node only.
bash examples/qwen2_5_vl_7b_geo3k_grpo.sh
See the veRL's official doc for more details about multi-node training and Ray debugger.
Other Baselines
We also reproduced the following two baselines of the R1-V project.
- CLEVR-70k-Counting: Train the Qwen2.5-VL-3B-Instruct model on counting problem.
- GeoQA-8k: Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem.
Performance Baselines
See baselines.md.
Awesome Work using EasyR1
- MMR1: Advancing the Frontiers of Multimodal Reasoning.
- Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models.
- Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement.
- MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse.
- Temporal-R1: Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward.
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation.
- GUI-R1: A Generalist R1-Style Vision-Language Action Model For GUI Agents.
- R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning.
- VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning.
- MM-UPT: Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO.
- RL-with-Cold-Start: Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start.
- ViGoRL: Grounded Reinforcement Learning for Visual Reasoning.
- Revisual-R1: Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning.
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward.
- Vision-Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning.
- VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use.
- Long-RL: Scaling RL to Long Sequences.
TODO
- Support LoRA (high priority).
- Support ulysses parallelism for VLMs (middle priority).
- Support more VLM architectures.
[!NOTE] We will not provide scripts for supervised fine-tuning and inference in this project. If you have such requirements, we recommend using LLaMA-Factory.
Known bugs
These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.
- Vision language models are not compatible with ulysses parallelism yet.
Discussion Group
👋 Join our WeChat group.
FAQs
ValueError: Image features and image tokens do not match: tokens: 8192, features 9800
Increase the data.max_prompt_length or reduce the data.max_pixels.
RuntimeError: CUDA Error: out of memory at /workspace/csrc/cumem_allocator.cpp:62
Reduce the worker.rollout.gpu_memory_utilization and enable worker.actor.offload.offload_params.
RuntimeError: 0 active drivers ([]). There should only be one.
Uninstall deepspeed from the current python environment.
Citation
Core contributors: Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang and Yuwen Xiong
We also thank Guangming Sheng and Chi Zhang for helpful discussions.
@misc{zheng2025easyr1,
title = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework},
author = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang, Yuwen Xiong},
howpublished = {\url{https://github.com/hiyouga/EasyR1}},
year = {2025}
}
We recommend to also cite the original work.
@article{sheng2024hybridflow,
title = {HybridFlow: A Flexible and Efficient RLHF Framework},
author = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
year = {2024},
journal = {arXiv preprint arXiv: 2409.19256}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lazyllm_verl-0.3.2.dev1-py3-none-any.whl.
File metadata
- Download URL: lazyllm_verl-0.3.2.dev1-py3-none-any.whl
- Upload date:
- Size: 129.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a044b0ab2dff206ca4c8d2507f37ad0f8dd7a1c643ba2a4241c889b294aa1668
|
|
| MD5 |
ef4f65152b5ee03b726c87b987a55d0b
|
|
| BLAKE2b-256 |
1cefc722b0f9c2bca4a70de689c80ba2d41805fb254cba5d2e084adaa2fb007d
|