Skip to main content

flash llm rl

Project description

⚡ FlashRL ⚡

Fast RL training with Quantized Rollouts (Blog)

What is FlashRL?Quick StartExperimentsCitation

What is FlashRL?

FlashRL patches the inference package (vLLM) to enable: 1) accurate rollout logprob computation for RL training; and 2) online quantization to generate rollouts in INT8 & FP8.

⚡ Quick Start

1. Installation

pip install flash-llm-rl # need to be installed in all nodes in multi-node training

(Optional) to verify the flash-rl install:

TODO

2. RL Logprob Patch Only

flashrl setup --fn bf16 -o $PATH_TO_PROFILE_PT_OUTPUT

export FLASHRL_CONFIG=$PATH_TO_PROFILE_PT_OUTPUT 
# alternatively, for submitting multi-node jobs via `ray submit`
# add `FLASHRL_CONFIG: $PATH_TO_CONFIG_YAML_OUTPUT` to runtime env
# as in TODO:PUT_AN_EXAMPLE
bash ... 

3. RL Rollout Quantization -- Simple Setup

Use our pre-set quantization profiles for simple setup.

# for Qwen2.5-0.5B-instruct
export FLASHRL_CONFIG=LiyuanLucasLiu/Qwen2.5-0.5B-Instruct-quantized.w8a8-RedHatAI/flashrl_config.yaml

# run Qwen2.5-0.5B experiments 
cd verl & bash TODO:UPLOAD_SCRIPT 

# for Qwen2.5-32B-instruct
export FLASHRL_CONFIG=LiyuanLucasLiu/Qwen2.5-32B-quantized.w8a8/flashrl_config.yaml

# run Qwen2.5-32B experiments 
cd verl & bash TODO:UPLOAD_SCRIPT

3. More Advanced

3.1 Profiling

flashrl profile -m $PATH_TO_MODEL -qm $PATH_TO_QUANTIZED_MODEL -o $PATH_TO_PROFILE_PT_OUTPUT --fn int8/fp8

3.2 Setup

flashrl setup --fn int8/fp8/bf16 -m $PATH_TO_MODEL -p $PATH_TO_PROFILE_PT_OUTPUT -o $PATH_TO_CONFIG_YAML_OUTPUT

3.3 RL Training

# for Qwen2.5-0.5B-instruct
export FLASHRL_CONFIG=$PATH_TO_CONFIG_YAML_OUTPUT

# run Qwen2.5-0.5B experiments 
cd verl & bash ... 

# for Qwen2.5-32B
export FLASHRL_CONFIG=$PATH_TO_CONFIG_YAML_OUTPUT
# or, alternatively, for submitting multi-node jobs via `ray submit`
# add `FLASHRL_CONFIG: $PATH_TO_CONFIG_YAML_OUTPUT` to runtime env
# as in TODO:PUT_AN_EXAMPLE

# run Qwen2.5-32B experiments 
cd verl & bash ... 

Example: Accelerating DAPO-Qwen2.5-32B with INT8

🚧 Roadmap & Future Improvements

We're working on several improvements to Flash-RL:

  • Support of Other RL Toolkits: Currently Flash-RL only supports VeRL, we are working on rolloing out support for other packages like OpenRLHF
  • Support of Other LLM Inference Toolkits: Currently Flash-RL only supports vLLM, we are working on rolloing out support for other tollkits like SgLang
  • Further Throughput Optimization: We are working on implementing efficient GPU kernels to accelerate online quantization

📚 Citation

If you find our work useful, please cite us:

@misc{yao2025offpolicy,
  title = {Your Efficient RL Framework Secretly Brings You Off-Policy RL Training},
  url = {https://fengyao.notion.site/off-policy-rl},
  author = {Yao, Feng and Liu, Liyuan and Zhang, Dinghuai and Dong, Chengyu and Shang, Jingbo and Gao, Jianfeng},
  journal = {Feng Yao's Notion},
  year = {2025},
  month = aug,
}
@misc{yao2025flashrl,
  title = {Flash-RL: Fast RL training with Quantized Rollouts},
  url = {https://fengyao.notion.site/flash-rl,
  author = {Liu, Liyuan and Yao, Feng and Zhang, Dinghuai and Dong, Chengyu and Shang, Jingbo and Gao, Jianfeng},
  journal = {Feng Yao's Notion},
  year = {2025},
  month = aug,
}

Questions?

If you have any questions related to the code or the blog, feel free to reach out to us at Liyuan Liu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flash_llm_rl-0.5.2.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flash_llm_rl-0.5.2-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file flash_llm_rl-0.5.2.tar.gz.

File metadata

  • Download URL: flash_llm_rl-0.5.2.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for flash_llm_rl-0.5.2.tar.gz
Algorithm Hash digest
SHA256 cdf5879c1cc891458ff39f82a0b3bea29ade1a72f33433357acc9e7b3505f9fa
MD5 fa923ce16f28a101c4b348e3d6567be5
BLAKE2b-256 762a0b89a467a8c9bfc7830e5faf4b3def0b9ddd422326aed1de72a827a439dd

See more details on using hashes here.

File details

Details for the file flash_llm_rl-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: flash_llm_rl-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for flash_llm_rl-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 888efc8017ed77ce893390a18e19e6a0d46d46812fb2ad75dddd0b9670748138
MD5 c3221d74e9ec9c6622a5287bbf48dbd6
BLAKE2b-256 d3a5e4c7afaa866dd07490d5231a4ca8f929dbea4c260c063fd324afaf483a17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page