Skip to main content

flash llm rl

Project description

⚡ FlashRL ⚡

Fast RL training with Quantized Rollouts (Blog)

What is FlashRL?Quick StartExperimentsCitation

What is FlashRL?

FlashRL patches the inference package (vLLM) to enable: 1) accurate rollout logprob computation for RL training; and 2) online quantization to generate rollouts in INT8 & FP8.

⚡ Quick Start

1. Installation

pip install flash-llm-rl # need to be installed in all nodes in multi-node training

(Optional) to verify the flash-rl install:

TODO

2. RL Logprob Patch Only

flashrl setup --fn bf16 -o $PATH_TO_PROFILE_PT_OUTPUT

export FLASHRL_CONFIG=$PATH_TO_PROFILE_PT_OUTPUT 
# alternatively, for submitting multi-node jobs via `ray submit`
# add `FLASHRL_CONFIG: $PATH_TO_CONFIG_YAML_OUTPUT` to runtime env
# as in TODO:PUT_AN_EXAMPLE
bash ... 

3. RL Rollout Quantization -- Simple Setup

Use our pre-set quantization profiles for simple setup.

# for Qwen2.5-0.5B-instruct
export FLASHRL_CONFIG=LiyuanLucasLiu/Qwen2.5-0.5B-Instruct-quantized.w8a8-RedHatAI/flashrl_config.yaml

# run Qwen2.5-0.5B experiments 
cd verl & bash TODO:UPLOAD_SCRIPT 

# for Qwen2.5-32B-instruct
export FLASHRL_CONFIG=LiyuanLucasLiu/Qwen2.5-32B-quantized.w8a8/flashrl_config.yaml

# run Qwen2.5-32B experiments 
cd verl & bash TODO:UPLOAD_SCRIPT

3. More Advanced

3.1 Profiling

flashrl profile -m $PATH_TO_MODEL -qm $PATH_TO_QUANTIZED_MODEL -o $PATH_TO_PROFILE_PT_OUTPUT --fn int8/fp8

3.2 Setup

flashrl setup --fn int8/fp8/bf16 -m $PATH_TO_MODEL -p $PATH_TO_PROFILE_PT_OUTPUT -o $PATH_TO_CONFIG_YAML_OUTPUT

3.3 RL Training

# for Qwen2.5-0.5B-instruct
export FLASHRL_CONFIG=$PATH_TO_CONFIG_YAML_OUTPUT

# run Qwen2.5-0.5B experiments 
cd verl & bash ... 

# for Qwen2.5-32B
export FLASHRL_CONFIG=$PATH_TO_CONFIG_YAML_OUTPUT
# or, alternatively, for submitting multi-node jobs via `ray submit`
# add `FLASHRL_CONFIG: $PATH_TO_CONFIG_YAML_OUTPUT` to runtime env
# as in TODO:PUT_AN_EXAMPLE

# run Qwen2.5-32B experiments 
cd verl & bash ... 

Example: Accelerating DAPO-Qwen2.5-32B with INT8

🚧 Roadmap & Future Improvements

We're working on several improvements to Flash-RL:

  • Support of Other RL Toolkits: Currently Flash-RL only supports VeRL, we are working on rolloing out support for other packages like OpenRLHF
  • Support of Other LLM Inference Toolkits: Currently Flash-RL only supports vLLM, we are working on rolloing out support for other tollkits like SgLang
  • Further Throughput Optimization: We are working on implementing efficient GPU kernels to accelerate online quantization

📚 Citation

If you find our work useful, please cite us:

@misc{yao2025offpolicy,
  title = {Your Efficient RL Framework Secretly Brings You Off-Policy RL Training},
  url = {https://fengyao.notion.site/off-policy-rl},
  author = {Yao, Feng and Liu, Liyuan and Zhang, Dinghuai and Dong, Chengyu and Shang, Jingbo and Gao, Jianfeng},
  journal = {Feng Yao's Notion},
  year = {2025},
  month = aug,
}
@misc{yao2025flashrl,
  title = {Flash-RL: Fast RL training with Quantized Rollouts},
  url = {https://fengyao.notion.site/flash-rl,
  author = {Yao, Feng and Liu, Liyuan and Zhang, Dinghuai and Dong, Chengyu and Shang, Jingbo and Gao, Jianfeng},
  journal = {Feng Yao's Notion},
  year = {2025},
  month = aug,
}

Questions?

If you have any questions related to the code or the blog, feel free to reach out to us at Liyuan Liu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flash_llm_rl-0.5.1.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flash_llm_rl-0.5.1-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file flash_llm_rl-0.5.1.tar.gz.

File metadata

  • Download URL: flash_llm_rl-0.5.1.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for flash_llm_rl-0.5.1.tar.gz
Algorithm Hash digest
SHA256 0d4e2d0470c405d96100f63abaca42741c95119f539c0475498c5924336f9ab9
MD5 f5abbe406dc03cd97f0f8fe872a89178
BLAKE2b-256 6a9ae1b7aae3d94ce48ab37481829cf06a5c8e53479b853d0178fdeee78a2f27

See more details on using hashes here.

File details

Details for the file flash_llm_rl-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: flash_llm_rl-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for flash_llm_rl-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9a60e7a6ee169749dee3533930d3120387a3b5b14f7415e226b73c4f036da0df
MD5 1ee36e170d9f4400e426e913a0cffcdd
BLAKE2b-256 05c59a957d445af184983970179575421fd883640a6bde09341e3dfcc113459e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page