Skip to main content

An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Project description

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

GitHub Repo stars Twitter

Used by Amazon Web Services

This project is a clean fork of the original veRL project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.

EasyR1 is efficient and scalable due to the design of HybirdEngine and the latest release of vLLM's SPMD mode.

Features

  • Supported models

    • Llama3/Qwen2/Qwen2.5/Qwen3 language models
    • Qwen2/Qwen2.5-VL vision language models
    • DeepSeek-R1 distill models
  • Supported algorithms

    • GRPO
    • DAPO
    • Reinforce++
    • ReMax
    • RLOO
  • Supported datasets

  • Supported tricks

    • Padding-free training
    • Resuming from the latest/best checkpoint
    • Wandb & SwanLab & Mlflow & Tensorboard tracking

Requirements

Software Requirements

  • Python 3.9+
  • transformers>=4.54.0
  • flash-attn>=2.4.3
  • vllm>=0.8.3

We provide a Dockerfile to easily build environments.

We recommend using the pre-built docker image in EasyR1.

docker pull hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0
docker run -it --ipc=host --gpus=all hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0

If your environment does not support Docker, you can consider using Apptainer:

apptainer pull easyr1.sif docker://hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0
apptainer shell --nv --cleanenv --bind /mnt/your_dir:/mnt/your_dir easyr1.sif

Use USE_MODELSCOPE_HUB=1 to download models from the ModelScope hub.

Hardware Requirements

* estimated

Method Bits 1.5B 3B 7B 32B 72B
GRPO Full Fine-Tuning AMP 2*24GB 4*40GB 8*40GB 16*80GB 32*80GB
GRPO Full Fine-Tuning BF16 1*24GB 1*40GB 4*40GB 8*80GB 16*80GB

[!NOTE] Use worker.actor.fsdp.torch_dtype=bf16 and worker.actor.optim.strategy=adamw_bf16 to enable bf16 training.

We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates.

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

image

Installation

git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .

GRPO Training

bash examples/qwen2_5_vl_7b_geo3k_grpo.sh

Merge Checkpoint in Hugging Face Format

python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actor

[!TIP] If you encounter issues with connecting to Hugging Face, consider using export HF_ENDPOINT=https://hf-mirror.com.

If you want to use SwanLab logger, consider using bash examples/qwen2_5_vl_7b_geo3k_swanlab.sh.

Custom Dataset

Please refer to the example datasets to prepare your own dataset.

How to Understand GRPO in EasyR1

image

How to Run 70B+ Model in Multi-node Environment

  1. Start the Ray head node.
ray start --head --port=6379 --dashboard-host=0.0.0.0
  1. Start the Ray worker node and connect to the head node.
ray start --address=<head_node_ip>:6379
  1. Check the Ray resource pool.
ray status
  1. Run training script on the Ray head node only.
bash examples/qwen2_5_vl_7b_geo3k_grpo.sh

See the veRL's official doc for more details about multi-node training and Ray debugger.

Other Baselines

We also reproduced the following two baselines of the R1-V project.

  • CLEVR-70k-Counting: Train the Qwen2.5-VL-3B-Instruct model on counting problem.
  • GeoQA-8k: Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem.

Performance Baselines

See baselines.md.

Awesome Work using EasyR1

  • MMR1: Advancing the Frontiers of Multimodal Reasoning. [code]
  • Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models. [code] [arxiv]
  • Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement. [code] [arxiv]
  • MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse. [code] [arxiv]
  • Temporal-R1: Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward. [code]
  • NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation. [code] [arxiv]
  • GUI-R1: A Generalist R1-Style Vision-Language Action Model For GUI Agents. [code] [arxiv]
  • R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning. [code]
  • VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning. [code] [arxiv]
  • MM-UPT: Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO. [code] [arxiv]
  • RL-with-Cold-Start: Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start. [code] [arxiv]
  • ViGoRL: Grounded Reinforcement Learning for Visual Reasoning. [code] [arxiv]
  • Revisual-R1: Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning. [code] [arxiv]
  • SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward. [code] [arxiv]
  • Vision-Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning. [code] [arxiv]
  • VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use. [code] [arxiv]
  • Long-RL: Scaling RL to Long Sequences. [code] [arxiv]

TODO

  • Support LoRA (high priority).
  • Support ulysses parallelism for VLMs (middle priority).
  • Support more VLM architectures.

[!NOTE] We will not provide scripts for supervised fine-tuning and inference in this project. If you have such requirements, we recommend using LLaMA-Factory.

Known bugs

These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.

  • Vision language models are not compatible with ulysses parallelism yet.

Discussion Group

👋 Join our WeChat group.

FAQs

ValueError: Image features and image tokens do not match: tokens: 8192, features 9800

Increase the data.max_prompt_length or reduce the data.max_pixels.

RuntimeError: CUDA Error: out of memory at /workspace/csrc/cumem_allocator.cpp:62

Reduce the worker.rollout.gpu_memory_utilization and enable worker.actor.offload.offload_params.

RuntimeError: 0 active drivers ([]). There should only be one.

Uninstall deepspeed from the current python environment.

Citation

Core contributors: Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang and Yuwen Xiong

We also thank Guangming Sheng and Chi Zhang for helpful discussions.

@misc{zheng2025easyr1,
  title        = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework},
  author       = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang, Yuwen Xiong},
  howpublished = {\url{https://github.com/hiyouga/EasyR1}},
  year         = {2025}
}

We recommend to also cite the original work.

@article{sheng2024hybridflow,
  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},
  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2409.19256}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazyllm_verl-0.3.2.dev1-py3-none-any.whl (129.9 kB view details)

Uploaded Python 3

File details

Details for the file lazyllm_verl-0.3.2.dev1-py3-none-any.whl.

File metadata

File hashes

Hashes for lazyllm_verl-0.3.2.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 a044b0ab2dff206ca4c8d2507f37ad0f8dd7a1c643ba2a4241c889b294aa1668
MD5 ef4f65152b5ee03b726c87b987a55d0b
BLAKE2b-256 1cefc722b0f9c2bca4a70de689c80ba2d41805fb254cba5d2e084adaa2fb007d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page