flash llm rl

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
Natural Language
- English

Project description

⚡ FlashRL ⚡

Fast RL training with Quantized Rollouts (Blog)

What is FlashRL? • Quick Start • Experiments • Citation

What is FlashRL?

FlashRL patches the inference package (vLLM) to enable: 1) accurate rollout logprob computation for RL training; and 2) online quantization to generate rollouts in INT8 & FP8.

⚡ Quick Start

1. Installation

pip install flashrl # need to be installed in all nodes in multi-node training

2. RL Logprob Patch Only

flashrl setup --fn bf16 -o $PATH_TO_PROFILE_PT_OUTPUT

export FLASHRL_CONFIG=$PATH_TO_PROFILE_PT_OUTPUT 
# alternatively, for submitting multi-node jobs via `ray submit`
# add `FLASHRL_CONFIG: $PATH_TO_CONFIG_YAML_OUTPUT` to runtime env
# as in TODO:PUT_AN_EXAMPLE
bash ...

3. RL Rollout Quantization -- Simple Setup

Use our pre-set quantization profiles for simple setup.

# for Qwen2.5-0.5B-instruct
export FLASHRL_CONFIG=LiyuanLucasLiu/Qwen2-0.5B-Instruct-quantized.w8a8-RedHatAI/flashrl_config.yaml

# run Qwen2.5-0.5B experiments 
cd verl & bash TODO:UPLOAD_SCRIPT 

# for Qwen2.5-32B-instruct
export FLASHRL_CONFIG=LiyuanLucasLiu/Qwen2.5-32B-quantized.w8a8/flashrl_config.yaml

# run Qwen2.5-32B experiments 
cd verl & bash TODO:UPLOAD_SCRIPT

3. More Advanced

3.1 Profiling

flashrl profile -m $PATH_TO_MODEL -qm $PATH_TO_QUANTIZED_MODEL -o $PATH_TO_PROFILE_PT_OUTPUT --fn int8/fp8

3.2 Setup

flashrl setup --fn int8/fp8/bf16 -m $PATH_TO_MODEL -p $PATH_TO_PROFILE_PT_OUTPUT -o $PATH_TO_CONFIG_YAML_OUTPUT

3.3 RL Training

# for Qwen2.5-0.5B-instruct
export FLASHRL_CONFIG=$PATH_TO_CONFIG_YAML_OUTPUT

# run Qwen2.5-0.5B experiments 
cd verl & bash ... 

# for Qwen2.5-32B
export FLASHRL_CONFIG=$PATH_TO_CONFIG_YAML_OUTPUT
# or, alternatively, for submitting multi-node jobs via `ray submit`
# add `FLASHRL_CONFIG: $PATH_TO_CONFIG_YAML_OUTPUT` to runtime env
# as in TODO:PUT_AN_EXAMPLE

# run Qwen2.5-32B experiments 
cd verl & bash ...

Example: Accelerating DAPO-Qwen2.5-32B with INT8

🚧 Roadmap & Future Improvements

We're working on several improvements to Flash-RL:

Support of Other RL Toolkits: Currently Flash-RL only supports VeRL, we are working on rolloing out support for other packages like OpenRLHF
Support of Other LLM Inference Toolkits: Currently Flash-RL only supports vLLM, we are working on rolloing out support for other tollkits like SgLang
Further Throughput Optimization: We are working on implementing efficient GPU kernels to accelerate online quantization

📚 Citation

If you find our work useful, please cite us:

@misc{yao2025offpolicy,
  title = {Your Efficient RL Framework Secretly Brings You Off-Policy RL Training},
  url = {https://fengyao.notion.site/off-policy-rl},
  author = {Yao, Feng and Liu, Liyuan and Zhang, Dinghuai and Dong, Chengyu and Gao, Jianfeng},
  journal = {Feng Yao's Notion},
  year = {2025},
  month = aug,
}
@misc{yao2025flashrl,
  title = {Flash-RL: Fast RL training with Quantized Rollouts},
  url = {https://fengyao.notion.site/flash-rl,
  author = {Yao, Feng and Liu, Liyuan and Zhang, Dinghuai and Dong, Chengyu and Gao, Jianfeng},
  journal = {Feng Yao's Notion},
  year = {2025},
  month = aug,
}

Questions?

If you have any questions related to the code or the blog, feel free to reach out to us at Liyuan Liu

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
Natural Language
- English

Release history Release notifications | RSS feed

1.0.3

Aug 29, 2025

1.0.1

Aug 15, 2025

1.0.0

Aug 11, 2025

0.5.2

Aug 10, 2025

0.5.1

Aug 8, 2025

This version

0.5.0

Aug 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flash_llm_rl-0.5.0.tar.gz (17.1 kB view details)

Uploaded Aug 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

flash_llm_rl-0.5.0-py3-none-any.whl (19.2 kB view details)

Uploaded Aug 5, 2025 Python 3

File details

Details for the file flash_llm_rl-0.5.0.tar.gz.

File metadata

Download URL: flash_llm_rl-0.5.0.tar.gz
Upload date: Aug 5, 2025
Size: 17.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for flash_llm_rl-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`be56e9b3f546b2ea1c571007b4f7c0d36c388c2ec5f4eab851890b2737f277c4`
MD5	`5d3c006eee4dd7df6370712050970f7b`
BLAKE2b-256	`fcab6cd029d5d4c5733418172e0647597fe4dd973a00e16317496bd1dfe0e5b4`

See more details on using hashes here.

File details

Details for the file flash_llm_rl-0.5.0-py3-none-any.whl.

File metadata

Download URL: flash_llm_rl-0.5.0-py3-none-any.whl
Upload date: Aug 5, 2025
Size: 19.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for flash_llm_rl-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2552e363b6caafb8a2b47f9609b13f622406de7a8a355f7fa26811ba641309cd`
MD5	`dc58fa9d5bbb0e4c0b680f6c2a15a93a`
BLAKE2b-256	`cb6a97141c818554b70fec5d85775a86a247d2472e8d14f375248c329f19ddd3`

See more details on using hashes here.

flash-llm-rl 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

⚡ FlashRL ⚡

What is FlashRL?

⚡ Quick Start

1. Installation

2. RL Logprob Patch Only

3. RL Rollout Quantization -- Simple Setup

3. More Advanced

3.1 Profiling

3.2 Setup

3.3 RL Training

Example: Accelerating DAPO-Qwen2.5-32B with INT8

🚧 Roadmap & Future Improvements

📚 Citation

Questions?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes