flash llm rl
Project description
⚡ FlashRL ⚡
Fast RL training with Quantized Rollouts (Blog)
What is FlashRL? • Quick Start • Experiments • Citation
What is FlashRL?
FlashRL patches the inference package (vLLM) to enable: 1) accurate rollout logprob computation for RL training; and 2) online quantization to generate rollouts in INT8 & FP8.
⚡ Quick Start
1. Installation
pip install flash-llm-rl # need to be installed in all nodes in multi-node training
(Optional) to verify the flash-rl install:
TODO
2. RL Logprob Patch Only
flashrl setup --fn bf16 -o $PATH_TO_PROFILE_PT_OUTPUT
export FLASHRL_CONFIG=$PATH_TO_PROFILE_PT_OUTPUT
# alternatively, for submitting multi-node jobs via `ray submit`
# add `FLASHRL_CONFIG: $PATH_TO_CONFIG_YAML_OUTPUT` to runtime env
# as in TODO:PUT_AN_EXAMPLE
bash ...
3. RL Rollout Quantization -- Simple Setup
Use our pre-set quantization profiles for simple setup.
# for Qwen2.5-0.5B-instruct
export FLASHRL_CONFIG=LiyuanLucasLiu/Qwen2.5-0.5B-Instruct-quantized.w8a8-RedHatAI/flashrl_config.yaml
# run Qwen2.5-0.5B experiments
cd verl & bash TODO:UPLOAD_SCRIPT
# for Qwen2.5-32B-instruct
export FLASHRL_CONFIG=LiyuanLucasLiu/Qwen2.5-32B-quantized.w8a8/flashrl_config.yaml
# run Qwen2.5-32B experiments
cd verl & bash TODO:UPLOAD_SCRIPT
3. More Advanced
3.1 Profiling
flashrl profile -m $PATH_TO_MODEL -qm $PATH_TO_QUANTIZED_MODEL -o $PATH_TO_PROFILE_PT_OUTPUT --fn int8/fp8
3.2 Setup
flashrl setup --fn int8/fp8/bf16 -m $PATH_TO_MODEL -p $PATH_TO_PROFILE_PT_OUTPUT -o $PATH_TO_CONFIG_YAML_OUTPUT
3.3 RL Training
# for Qwen2.5-0.5B-instruct
export FLASHRL_CONFIG=$PATH_TO_CONFIG_YAML_OUTPUT
# run Qwen2.5-0.5B experiments
cd verl & bash ...
# for Qwen2.5-32B
export FLASHRL_CONFIG=$PATH_TO_CONFIG_YAML_OUTPUT
# or, alternatively, for submitting multi-node jobs via `ray submit`
# add `FLASHRL_CONFIG: $PATH_TO_CONFIG_YAML_OUTPUT` to runtime env
# as in TODO:PUT_AN_EXAMPLE
# run Qwen2.5-32B experiments
cd verl & bash ...
Example: Accelerating DAPO-Qwen2.5-32B with INT8
🚧 Roadmap & Future Improvements
We're working on several improvements to Flash-RL:
- Support of Other RL Toolkits: Currently Flash-RL only supports
VeRL, we are working on rolloing out support for other packages likeOpenRLHF - Support of Other LLM Inference Toolkits: Currently Flash-RL only supports
vLLM, we are working on rolloing out support for other tollkits likeSgLang - Further Throughput Optimization: We are working on implementing efficient GPU kernels to accelerate online quantization
📚 Citation
If you find our work useful, please cite us:
@misc{yao2025offpolicy,
title = {Your Efficient RL Framework Secretly Brings You Off-Policy RL Training},
url = {https://fengyao.notion.site/off-policy-rl},
author = {Yao, Feng and Liu, Liyuan and Zhang, Dinghuai and Dong, Chengyu and Shang, Jingbo and Gao, Jianfeng},
journal = {Feng Yao's Notion},
year = {2025},
month = aug,
}
@misc{yao2025flashrl,
title = {Flash-RL: Fast RL training with Quantized Rollouts},
url = {https://fengyao.notion.site/flash-rl,
author = {Yao, Feng and Liu, Liyuan and Zhang, Dinghuai and Dong, Chengyu and Shang, Jingbo and Gao, Jianfeng},
journal = {Feng Yao's Notion},
year = {2025},
month = aug,
}
Questions?
If you have any questions related to the code or the blog, feel free to reach out to us at Liyuan Liu
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flash_llm_rl-0.5.1.tar.gz.
File metadata
- Download URL: flash_llm_rl-0.5.1.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d4e2d0470c405d96100f63abaca42741c95119f539c0475498c5924336f9ab9
|
|
| MD5 |
f5abbe406dc03cd97f0f8fe872a89178
|
|
| BLAKE2b-256 |
6a9ae1b7aae3d94ce48ab37481829cf06a5c8e53479b853d0178fdeee78a2f27
|
File details
Details for the file flash_llm_rl-0.5.1-py3-none-any.whl.
File metadata
- Download URL: flash_llm_rl-0.5.1-py3-none-any.whl
- Upload date:
- Size: 19.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a60e7a6ee169749dee3533930d3120387a3b5b14f7415e226b73c4f036da0df
|
|
| MD5 |
1ee36e170d9f4400e426e913a0cffcdd
|
|
| BLAKE2b-256 |
05c59a957d445af184983970179575421fd883640a6bde09341e3dfcc113459e
|