Skip to main content

flash llm rl

Project description

⚡ FlashRL ⚡

Fast RL training with Quantized Rollouts (Blog)

What is FlashRL?Quick StartExperimentsCitation

What is FlashRL?

FlashRL patches the inference package (vLLM) to enable: 1) accurate rollout logprob computation for RL training; and 2) online quantization to generate rollouts in INT8 & FP8.

⚡ Quick Start

1. Installation

pip install flashrl # need to be installed in all nodes in multi-node training

2. RL Logprob Patch Only

flashrl setup --fn bf16 -o $PATH_TO_PROFILE_PT_OUTPUT

export FLASHRL_CONFIG=$PATH_TO_PROFILE_PT_OUTPUT 
# alternatively, for submitting multi-node jobs via `ray submit`
# add `FLASHRL_CONFIG: $PATH_TO_CONFIG_YAML_OUTPUT` to runtime env
# as in TODO:PUT_AN_EXAMPLE
bash ... 

3. RL Rollout Quantization -- Simple Setup

Use our pre-set quantization profiles for simple setup.

# for Qwen2.5-0.5B-instruct
export FLASHRL_CONFIG=LiyuanLucasLiu/Qwen2-0.5B-Instruct-quantized.w8a8-RedHatAI/flashrl_config.yaml

# run Qwen2.5-0.5B experiments 
cd verl & bash TODO:UPLOAD_SCRIPT 

# for Qwen2.5-32B-instruct
export FLASHRL_CONFIG=LiyuanLucasLiu/Qwen2.5-32B-quantized.w8a8/flashrl_config.yaml

# run Qwen2.5-32B experiments 
cd verl & bash TODO:UPLOAD_SCRIPT

3. More Advanced

3.1 Profiling

flashrl profile -m $PATH_TO_MODEL -qm $PATH_TO_QUANTIZED_MODEL -o $PATH_TO_PROFILE_PT_OUTPUT --fn int8/fp8

3.2 Setup

flashrl setup --fn int8/fp8/bf16 -m $PATH_TO_MODEL -p $PATH_TO_PROFILE_PT_OUTPUT -o $PATH_TO_CONFIG_YAML_OUTPUT

3.3 RL Training

# for Qwen2.5-0.5B-instruct
export FLASHRL_CONFIG=$PATH_TO_CONFIG_YAML_OUTPUT

# run Qwen2.5-0.5B experiments 
cd verl & bash ... 

# for Qwen2.5-32B
export FLASHRL_CONFIG=$PATH_TO_CONFIG_YAML_OUTPUT
# or, alternatively, for submitting multi-node jobs via `ray submit`
# add `FLASHRL_CONFIG: $PATH_TO_CONFIG_YAML_OUTPUT` to runtime env
# as in TODO:PUT_AN_EXAMPLE

# run Qwen2.5-32B experiments 
cd verl & bash ... 

Example: Accelerating DAPO-Qwen2.5-32B with INT8

🚧 Roadmap & Future Improvements

We're working on several improvements to Flash-RL:

  • Support of Other RL Toolkits: Currently Flash-RL only supports VeRL, we are working on rolloing out support for other packages like OpenRLHF
  • Support of Other LLM Inference Toolkits: Currently Flash-RL only supports vLLM, we are working on rolloing out support for other tollkits like SgLang
  • Further Throughput Optimization: We are working on implementing efficient GPU kernels to accelerate online quantization

📚 Citation

If you find our work useful, please cite us:

@misc{yao2025offpolicy,
  title = {Your Efficient RL Framework Secretly Brings You Off-Policy RL Training},
  url = {https://fengyao.notion.site/off-policy-rl},
  author = {Yao, Feng and Liu, Liyuan and Zhang, Dinghuai and Dong, Chengyu and Gao, Jianfeng},
  journal = {Feng Yao's Notion},
  year = {2025},
  month = aug,
}
@misc{yao2025flashrl,
  title = {Flash-RL: Fast RL training with Quantized Rollouts},
  url = {https://fengyao.notion.site/flash-rl,
  author = {Yao, Feng and Liu, Liyuan and Zhang, Dinghuai and Dong, Chengyu and Gao, Jianfeng},
  journal = {Feng Yao's Notion},
  year = {2025},
  month = aug,
}

Questions?

If you have any questions related to the code or the blog, feel free to reach out to us at Liyuan Liu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flash_llm_rl-0.5.0.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flash_llm_rl-0.5.0-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file flash_llm_rl-0.5.0.tar.gz.

File metadata

  • Download URL: flash_llm_rl-0.5.0.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for flash_llm_rl-0.5.0.tar.gz
Algorithm Hash digest
SHA256 be56e9b3f546b2ea1c571007b4f7c0d36c388c2ec5f4eab851890b2737f277c4
MD5 5d3c006eee4dd7df6370712050970f7b
BLAKE2b-256 fcab6cd029d5d4c5733418172e0647597fe4dd973a00e16317496bd1dfe0e5b4

See more details on using hashes here.

File details

Details for the file flash_llm_rl-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: flash_llm_rl-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for flash_llm_rl-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2552e363b6caafb8a2b47f9609b13f622406de7a8a355f7fa26811ba641309cd
MD5 dc58fa9d5bbb0e4c0b680f6c2a15a93a
BLAKE2b-256 cb6a97141c818554b70fec5d85775a86a247d2472e8d14f375248c329f19ddd3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page