skyrl-train

These details have not been verified by PyPI

Project description

SkyRL-Train: A modular, performant RL framework for post-training LLMs

Overview

With a focus on modularity, skyrl-train makes it easy to prototype new training algorithms, environments, and execution plans—without compromising usability or speed.

skyrl-train is for users who want to modify anything:

Quickly develop new environments without modifying or understanding the training code.
Modify the training execution plan such as model placement, colocation or disaggregation of training and generation, and async RL.
Implement custom trajectory generation specific to your use-case, such as custom sampling methods, tree search, etc.
… make any other flexible modifications to the RL workflow!

Key Features

The skyrl-train package supports:

PPO and GRPO
Training Backends: FSDP, FSDP2, Megatron, and DeepSpeed
Inference backends: vLLM, SGLang, and any custom OpenAI API compatible endpoint that exposes a method to perform weight sync
Ulysses sequence parallelism for long-context training
Colocated or disaggregated training and generation (including on heterogeneous hardware)
Synchronous RL or async one-off pipelining
Simple batched rollouts or Asynchronous rollouts for multi-turn conversations
Weight sync via NCCL, gloo, or checkpoint-and-load
Integration with skyrl-gym to run any environment in the gymnasium
Sequence packing and Flash Attention 2

Documentation

Find skyrl-train documentation at: skyrl.readthedocs.io/en/latest/

Quick Start

A quick start guide for installation and your first training run is provided below.

Requirements

The only requirements are:

CUDA version 12.8
uv

If you're running on an existing Ray cluster, make sure to use Ray 2.48.0 and Python 3.12. If not, proceed with the installation instructions below.

First, clone the repository:

git clone --recurse-submodules https://github.com/NovaSky-AI/SkyRL
cd SkyRL/skyrl-train

Then, create a new virtual environment and install the dependencies:

# creates a venv at .venv/
uv sync --extra vllm 
source .venv/bin/activate

Then, prepare the dataset:

uv run -- python examples/gsm8k/gsm8k_dataset.py

Finally, before training, make sure to configure Ray to use uv:

export RAY_RUNTIME_ENV_HOOK=ray._private.runtime_env.uv_runtime_env_hook.hook
# or add to your .bashrc
# echo 'export RAY_RUNTIME_ENV_HOOK=ray._private.runtime_env.uv_runtime_env_hook.hook' >> ~/.bashrc

You should now be able to run our example script (assumes at least 4 GPUs):

export WANDB_API_KEY=<your wandb api key>
bash examples/gsm8k/run_gsm8k.sh

For detailed installation instructions, as well as more examples, please refer to our documentation.

Training on a new task or environment

To implement a new task or environment using the SkyRL-Gym interface, please see our Walkthrough Docs.

If you don't want to use the SkyRL-Gym interface, or you have an existing task or agentic pipeline implementation and just want to train with it on top of SkyRL, we recommend you create a simple custom Generator, which requires implementing a single method, generate(). We have one example of a custom Generator at SkyRLGymGenerator which executes environments written in the SkyRL-Gym interface. We are working to provide more example integrations of agent harnesses -- please reach out if you'd like yours to be one of them!

Reproducing SkyRL-SQL

We also test SkyRL by reproducing our prior release SkyRL-SQL, which enabled efficient Multi-Turn RL for Text2SQL. You can find a link to the wandb report here, and a detailed walk through of the reproduction in our documentation.

Acknowledgement

This work is done at Berkeley Sky Computing Lab in collaboration with Anyscale, with generous compute support from Anyscale, Databricks, NVIDIA, Lambda Labs, and AMD.

We adopt many lessons and code from several great projects such as veRL, OpenRLHF, Search-R1, OpenReasonerZero, and NeMo-RL. We appreciate each of these teams and their contributions to open-source research!

Citation

If you find the work in skyrl-train helpful, please consider citing:

@misc{griggs2025skrylv01,
      title={Evolving SkyRL into a Highly-Modular RL Framework},
      author={Tyler Griggs and Sumanth Hegde and Eric Tang and Shu Liu and Shiyi Cao and Dacheng Li and Charlie Ruan and Philipp Moritz and Kourosh Hakhamaneshi and Richard Liaw and Akshay Malik and Matei Zaharia and Joseph E. Gonzalez and Ion Stoica},
      year={2025},
      note={Notion Blog}
}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.1

Jan 2, 2026

0.3.0

Dec 3, 2025

This version

0.2.0

Oct 13, 2025

0.1.0

Aug 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skyrl_train-0.2.0.tar.gz (178.4 kB view details)

Uploaded Oct 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

skyrl_train-0.2.0-py3-none-any.whl (205.5 kB view details)

Uploaded Oct 13, 2025 Python 3

File details

Details for the file skyrl_train-0.2.0.tar.gz.

File metadata

Download URL: skyrl_train-0.2.0.tar.gz
Upload date: Oct 13, 2025
Size: 178.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for skyrl_train-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`73fc875a080443c7a62a2a1469cf3038ec737c5640b768c36905d3e6c018585f`
MD5	`c1207542c8081c3548ce33def3f7cefe`
BLAKE2b-256	`a76b8fc83dd22db8d5b83799900bee6aa346e4912b3efc10699c7648e934a1b4`

See more details on using hashes here.

File details

Details for the file skyrl_train-0.2.0-py3-none-any.whl.

File metadata

Download URL: skyrl_train-0.2.0-py3-none-any.whl
Upload date: Oct 13, 2025
Size: 205.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for skyrl_train-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d20317886a3744e6c31f1cd69ed8888f4947931b590fb0c6b7562b124f049294`
MD5	`c4617c2b8e8dbf1fa79dde59757c48e6`
BLAKE2b-256	`e8436940bf49ef0215eb04cef7828bf6f0cef12adfdbb3dd2c31c66e80dc919a`

See more details on using hashes here.

skyrl-train 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

SkyRL-Train: A modular, performant RL framework for post-training LLMs

Overview

Key Features

Documentation

Quick Start

Requirements

Training on a new task or environment

Reproducing SkyRL-SQL

Acknowledgement

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes