miles-rl

Enterprise-grade reinforcement learning for large-scale model training.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sdong15

These details have not been verified by PyPI

Project links

Documentation

Project description

Enterprise-Grade Reinforcement Learning for Large-Scale Model Training

High-Performance Rollout • Low Precision Training • Production Stability

Latest Updates | Quick Start | Key Features | Documentation

Latest Updates

[2026/02] 💡 Miles Detailed Arguments: We've added a detailed command-line argument guide used to configure Miles for RL training and inference. These arguments enable precise control over cluster resources, training backends (Megatron/FSDP), inference optimization via SGLang, and RL algorithmic hyperparameters. Link
[2026/01] 💎 INT4 Quantization-Aware Training (QAT): Inspired by the Kimi K2-Thinking report, Miles now features a full-stack INT4 W4A16 QAT pipeline. This allows 1TB-scale models to fit into single-machine VRAM (e.g., NVIDIA H200), doubling rollout efficiency by eliminating cross-node bottlenecks while maintaining BF16-equivalent accuracy. Blog
[2026/01] 💎 Unified VLM/LLM Multi-Turn Training: We provided an implementation for the VLM multi-turn sampling paradigm. Developers only need to write a customized rollout function to easily start multi-turn RL for VLM, just like training LLM. Blog
[2026/01] 🤖 Multi-Agent Co-Evolution: Miles now supports MrlX, a novel asynchronous co-evolutionary framework for Multi-Agent RL. Achieve superior performance in complex tasks like Doctor-Patient simulations and DeepResearch pipelines by enabling specialized agents to evolve together symbiotically. [Link]
[2025/12] 🔄 Rollout Routing Replay (R3): In collaboration with SGLang, we've launched R3 to solve MoE RL instability. R3 records inference routing decisions and replays them during training, effectively eliminating the "training-inference mismatch" and preventing training collapse in large MoE models like Qwen3 and DeepSeek-V3. [Paper] [Docs]
[2025/11] 🔥 Unified FP8 Release: Solves the stability issues in MoE RL by ensuring training and inference use the exact same FP8 quantization logic. [Blog]
[2025/11] ⚡ Speculative Decoding in RL: Integrated speculative rollout with online SFT for draft models, achieving massive throughput gains. [Blog]
[2025/11] 🎉 Miles Project Launch: A joint effort by InfiXAI, Ant Group, SGLang RL Team, and the Miles community. [Announcement]

What is Miles?

Miles is a high-performance, enterprise-ready reinforcement learning (RL) framework specifically optimized for Large-Scale model Post-Training. Built as a powerful fork of slime, Miles bridges the gap between research-grade RL and production-grade reliability by integrating SGLang for high-throughput rollout and Megatron-LM for scalable training.

"A journey of a thousand miles begins with a single rollout." — Miles focuses on the low-level system optimizations that make large-scale RL stable, efficient, and reproducible.

Key Features

🌪️ Advanced MoE & Low-Precision Training

Unified FP8 Pipeline: The first framework to implement end-to-end FP8 sampling and training. By unifying precision across rollout and training, Miles eliminates the quantization-induced discrepancy that causes RL collapse in large MoE models.
Rollout Routing Replay (R3): Records expert routing decisions during SGLang inference and replays them during training to ensure bit-wise expert alignment.
INT4 QAT Support: Recommendation for 1TB+ models to enable single-machine (e.g., H200) deployment by significantly reducing memory footprint.

🛡️ Eliminating Train-Inference Mismatch

Bit-wise Identical Training and Inference Log Probs: System-level solution achieving deterministic forward/backward passes through kernel-level optimization (FlashAttention-3, DeepGEMM).
Algorithmic Correction (TIS/MIS): When mismatch is unavoidable, Miles provides Truncated Importance Sampling (TIS) and Masked Importance Sampling (MIS) to mitigate off-policy bias and prevent training divergence.

⚡ Extreme Performance & Efficiency

Speculative RL Training: Achieve 25%+ rollout speedup by using an Online SFT Draft Model. Unlike frozen draft models, Miles updates the draft policy during RL to prevent policy drift.
Zero-Copy Weight Sync: Optimized weight refit via CUDA IPC zero-copy mapping, async tensor gathering, and bucketed flattening. Sync time reduced by 50% compared to standard HTTP/RPC transfers.
Partial Rollout & Over-Sampling: Handles the "Long-Tail Effect" in multi-turn RL by over-sampling requests and recycling half-finished trajectories to maximize GPU utilization.

Model Support & Training Diversity

🏗️ Supported Models

Miles supports a wide range of state-of-the-art architectures, with a special emphasis on DeepSeek, Qwen, Llama and mainstream models.

Family	Supported Models
DeepSeek	R1, V3, V3.2
Qwen	Qwen 2, 2.5, 3
Llama	Llama 3, 3.1, 3.3, 4
Gemma	Gemma 2, 3, 3N
GLM	GLM-4.5, GLM-4.6, GLM-4.7
MiniMax	M2, M2.1
Others	Mistral, Mixtral, Phi, gpt-oss and any model supported by SGLang and Megatron

🧩 Diverse Training Scenarios

Miles is designed to handle the complexity of modern RL workloads across various dimensions:

Multi-Turn Interaction: Optimized for complex, multi-round conversations and tool-use scenarios.
VLM & LLM Support: Unified framework for both Vision-Language and pure Text models.
Reasoning & Coding: Specific recipes and optimizations for Reasoning (Math/Logic) and Coding Agent tasks.
Multi-Agent Training: Support for advanced co-training and collaborative multi-agent reinforcement learning.

Quick Start

Installation

We recommend using our official Docker image for the best performance and compatibility:

# Pull the latest image
docker pull radixark/miles:latest

# Or install from source
pip install -r requirements.txt
pip install -e .

Launch Training

Miles provides a unified entry point for complex RL tasks. Here is an example of FP8 GRPO training for Qwen3:

python train.py \
    --advantage-estimator grpo \
    --model-name qwen3-30b-a3b \
    --hf-checkpoint /path/to/qwen3-30b-a3b-hf \
    --rollout-batch-size 512 \
    --n-samples-per-prompt 8

For comprehensive guides on environment setup and custom reward functions, see the Quick Start Guide.

Roadmap

✅ Completed

Unified FP8 E2E Training & Rollout
INT4 Quantization-Aware Training (QAT): Single-machine 1TB models
Speculative RL with Online SFT
Multi-Agent RL (Co-evolutionary frameworks like MrlX)
Support DeepSeek V3.2 Models
VLM Multi-Turn Training
Aligning SGLang with Megatron in Dense Models
Rollout Routing Replay (R3)

🏗️ In Progress & Planned

Zero mismatch for MoE RL
Aligning SGLang with Megatron in MoE Models
Diffusion RL Support
Omni RL Support
Diffusion LLM RL Support
Elastic Resource Scheduling: Dynamic scaling of rollout vs. training workers

Acknowledgements

Miles is built upon the shoulders of giants in the LLM infrastructure ecosystem:

slime: The core modular architecture and inspiration.
SGLang: The high-performance inference engine.
Megatron-LM: Robust large-scale training components.

Special thanks to InfiXAI Team, Ant Group AQ Team, SGLang RL Team, and the Miles Team. We also thank DataCrunch for compute sponsorship and NVIDIA for technical support on Transformer Engine (TE).

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sdong15

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.0.2

May 28, 2026

0.0.1

May 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

miles_rl-0.0.2.tar.gz (6.3 MB view details)

Uploaded May 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

miles_rl-0.0.2-py3-none-any.whl (8.2 MB view details)

Uploaded May 28, 2026 Python 3

File details

Details for the file miles_rl-0.0.2.tar.gz.

File metadata

Download URL: miles_rl-0.0.2.tar.gz
Upload date: May 28, 2026
Size: 6.3 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for miles_rl-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`6cf59e4ae7e5e8c2985c1db55e7e0346a7541caa4f3657957019870ae2d1db27`
MD5	`8d48a9f488781fd0906d3b567892b694`
BLAKE2b-256	`18afe2cf632191f58094fbd5fb1b6c729d362f05839ad8edce660cc097aa2ee4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for miles_rl-0.0.2.tar.gz:

Publisher: publish-pypi.yml on radixark/miles

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: miles_rl-0.0.2.tar.gz
- Subject digest: 6cf59e4ae7e5e8c2985c1db55e7e0346a7541caa4f3657957019870ae2d1db27
- Sigstore transparency entry: 1652663452
- Sigstore integration time: May 28, 2026
Source repository:
- Permalink: radixark/miles@1dae6b5c81fb855cfc753b15e99ad5f421262b97
- Branch / Tag: refs/heads/shi/pypi-metadata-and-0.0.2
- Owner: https://github.com/radixark
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@1dae6b5c81fb855cfc753b15e99ad5f421262b97
- Trigger Event: workflow_dispatch

File details

Details for the file miles_rl-0.0.2-py3-none-any.whl.

File metadata

Download URL: miles_rl-0.0.2-py3-none-any.whl
Upload date: May 28, 2026
Size: 8.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for miles_rl-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`452c5c0d8b7fc3729f225e63194f16fa7bbe587b6936386c6501793fce7cf820`
MD5	`d1a9d187b7de99d344db4dbc099d16f2`
BLAKE2b-256	`4fc2df3eab82841a0a885c9127daa54647dbf67f03626bcf07de00a4a865a66d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for miles_rl-0.0.2-py3-none-any.whl:

Publisher: publish-pypi.yml on radixark/miles

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: miles_rl-0.0.2-py3-none-any.whl
- Subject digest: 452c5c0d8b7fc3729f225e63194f16fa7bbe587b6936386c6501793fce7cf820
- Sigstore transparency entry: 1652663483
- Sigstore integration time: May 28, 2026
Source repository:
- Permalink: radixark/miles@1dae6b5c81fb855cfc753b15e99ad5f421262b97
- Branch / Tag: refs/heads/shi/pypi-metadata-and-0.0.2
- Owner: https://github.com/radixark
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@1dae6b5c81fb855cfc753b15e99ad5f421262b97
- Trigger Event: workflow_dispatch

miles-rl 0.0.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Enterprise-Grade Reinforcement Learning for Large-Scale Model Training

High-Performance Rollout • Low Precision Training • Production Stability

Latest Updates

What is Miles?

Key Features

🌪️ Advanced MoE & Low-Precision Training

🛡️ Eliminating Train-Inference Mismatch

⚡ Extreme Performance & Efficiency

Model Support & Training Diversity

🏗️ Supported Models

🧩 Diverse Training Scenarios

Quick Start

Installation

Launch Training

Roadmap

✅ Completed

🏗️ In Progress & Planned

Acknowledgements

Links

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance