oxrl

A lightweight post-training framework for LLMs

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

oxRL

Post-train any model under 10 lines of code.

A lightweight post-training framework for LLMs, VLMs, and VLAs. Maximizing developer speed. Scales to billions of parameters with DeepSpeed, vLLM, and Ray.

🚀 New in v1.1: Reasoning & Multimodal RL

We've significantly expanded oxRL's capabilities to support the latest trending architectures and training recipes:

Verifiable Reasoning (Open-R1): Native support for reasoning models with <thought> and <answer> tag enforcement and rule-based correctness rewards.
Simple Preference Optimization (SimPO): State-of-the-art reference-free alignment that reduces VRAM by 40% and improves logical reasoning.
Multimodal RL: Support for Vision-Language (VLM) and Audio-Language models. Seamless base64-to-tensor pipeline for on-policy rollouts.
GPQA & ScienceQA: Integrated high-difficulty reasoning and multimodal datasets.
Memory-Efficient LoRA: Built-in PEFT integration allows post-training 14B+ models on restricted hardware.

Usage (Python API)

Post-train any model in under 10 lines of code. oxRL auto-detects your hardware, auto-prepares datasets, and scales to multi-GPU automatically.

from oxrl import Trainer

# 1. Initialize with any HuggingFace model
trainer = Trainer(model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B")

# 2. Start reasoning post-training (Open-R1 recipe)
trainer.train(task="reasoning")

Supported Models

The following models have been verified and onboarded using our automated pipeline. You can find ready-to-use scripts in the examples/onboarded_models/ directory.

Model	Size	Task	Strategy	Status
DeepSeek-R1-Distill-Llama-8B	8.0B	Reasoning	LoRA	✅ Verified
DeepSeek-R1-Distill-Qwen-7B	7.0B	Reasoning	LoRA	✅ Verified
Qwen2.5-Coder-7B-Instruct	7.6B	Coding	LoRA	✅ Verified
Qwen2-Audio-7B-Instruct	7.0B	Audio	LoRA	✅ Verified
Qwen2-VL-7B-Instruct	7.0B	Vision	LoRA	✅ Verified
Gemma-3-1b-it	1.0B	Multimodal	Full-tuning	✅ Verified
Mistral-7B-Instruct-v0.3	7.0B	Instruct	LoRA	✅ Verified
Qwen2.5-7B-Instruct	7.0B	Math	LoRA	✅ Verified
SmolLM2-1.7B-Instruct	1.7B	Instruct	Full-tuning	✅ Verified

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         oxRL Framework                          │
├─────────────────────┬───────────────────┬───────────────────────┤
│   Training Engines  │  Rollout Engines  │    Config + Data      │
│   (Ray + DeepSpeed) │  (Ray + vLLM)     │    (Pydantic + HF)    │
├─────────────────────┼───────────────────┼───────────────────────┤
│                     │                   │                       │
│  algs/grpo.py       │ rollouts/         │ configs/load.py       │
│    SGRPO loss       │   vllm_engine.py  │ configs/*.yaml        │
│    LoRA / PEFT      │   replay_buffer.py│                       │
│  algs/PPO/ppo.py    │                   │ datasets/             │
│  algs/SFT/sft.py    │                   │   prompt_only.py      │
│                     │                   │   (Multimodal Ready)  │
├─────────────────────┴───────────────────┴───────────────────────┤
│  swarm/             │  utils/logging.py  │  rewards/compute_score  │
│    orchestrator.py  │  utils/setup.py    │  (Reasoning / Code)     │
└──────────────────┴────────────────────┴─────────────────────────┘

RL Training Workflow

Scout Agent: Discovers model metadata and ensures chat_template compatibility.
Multimodal Pipeline: Converts base64 images/audio into PIL/NumPy for vLLM rollouts.
LoRA Lifecycle: Train with adapters, save with gathered ZeRO-3 weights, and auto-strip PEFT prefixes for immediate vLLM compatibility.
Verifiable Rewards: Programmatic verification of CoT tags and mathematical correctness.

Quick Start

Installation

pip install oxrl

Post-train a Reasoning Model

# config.yaml
model:
  name: "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"
lora:
  enabled: true
reward:
  reward_func: "reasoning_reward_func"
data:
  dataset: "openr1_math"

python main_rl.py --config-file config.yaml

Algorithms

Algorithm	File	Description
SGRPO	`algs/grpo.py`	Stable GRPO — Clipped surrogate loss with LoRA support and reference-free variants.
SimPO	`algs/simpo.py`	Simple Preference Optimization — Reference-free and length-normalized alignment.
CISPO	`algs/grpo.py`	Clipped importance-sampling policy optimization.
PPO	`algs/PPO/ppo.py`	Proximal Policy Optimization with GAE and value clipping.

Project Structure

oxRL/
├── main_rl.py              RL training loop (Ray + DeepSpeed)
├── swarm/                  Autonomous model onboarding (Scout, Bugfixer)
├── preprocessing/          Reasoning (OpenR1), Multimodal (Vision/Audio) preprocessors
├── rollouts/               vLLM inference with structured prompt support
├── rewards/                Verifiable reasoning and coding rewards

design-principles

Debuggability over Pipelining. oxRL avoids complex async pipelining to ensure that failure states are 100% reproducible and logs are clear.

LoRA-first for 7B+. We default to LoRA for larger models to enable high-quality research on consumer-grade and restricted high-end hardware.

Verification-driven RL. We prioritize datasets where the reward is verifiable (Math, Code, Format) to drive logical discovery.

Contributing

Contributions are welcome. Please follow the existing architectural patterns and style.

FAQ

Check out the FAQ for details on LoRA merging and Multimodal input formatting.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.7.2

Mar 4, 2026

1.7.1

Mar 4, 2026

1.7.0

Mar 4, 2026

1.5.0

Feb 27, 2026

This version

1.2.0

Feb 25, 2026

0.8.1

Feb 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oxrl-1.2.0.tar.gz (16.0 kB view details)

Uploaded Feb 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

oxrl-1.2.0-py3-none-any.whl (14.2 kB view details)

Uploaded Feb 25, 2026 Python 3

File details

Details for the file oxrl-1.2.0.tar.gz.

File metadata

Download URL: oxrl-1.2.0.tar.gz
Upload date: Feb 25, 2026
Size: 16.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for oxrl-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`fc8281792202e78522edd33e91577b08711bec395919c83a5de4118089d567c1`
MD5	`e904c99467e063acefb397bd519d3f9e`
BLAKE2b-256	`004db4b6ef29ef4826f8513092d6e5b0abe3c13eda8aa41ce55d3b271fc0840f`

See more details on using hashes here.

File details

Details for the file oxrl-1.2.0-py3-none-any.whl.

File metadata

Download URL: oxrl-1.2.0-py3-none-any.whl
Upload date: Feb 25, 2026
Size: 14.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for oxrl-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0a1fc754bc1be257f8d62a3903e38ca65a9ef94184be64411c403e62419418e3`
MD5	`7e6ccf697440a88321daebfc47530cf1`
BLAKE2b-256	`7aca5f148ffbfaf0c422adafa9afb43c8aebe790667102effe61f0f9389ee0c5`

See more details on using hashes here.

oxrl 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🚀 New in v1.1: Reasoning & Multimodal RL

Usage (Python API)

Supported Models

System Architecture

RL Training Workflow

Quick Start

Installation

Post-train a Reasoning Model

Algorithms

Project Structure

design-principles

Contributing

FAQ

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes