Turn any repository into an RL environment for training and evaluation.
Project description
Repo2RLEnv
Turn any repository into an RL environment for training and evaluation.
Repo2RLEnv synthesizes verifiable data from existing repositories using a variety of methods, exports it into a uniform spec, and lets you train models, evaluate agents, and publish straight to the Hugging Face Hub. End-to-end — synthesis, training, eval, export — with the main focus on training. The uniform spec is Harbor's, so the datasets you produce drop straight into any Harbor-compatible runtime as well.
╭──────────────╮ ╭──────────────╮ ╭──────────────╮ ╭──────────────────╮
│ any │ ──▶ │ synthesize │ ──▶ │ uniform spec │ ──▶ │ train · eval · │
│ repo │ │ (pipelines) │ │ (Harbor) │ │ push to HF Hub │
╰──────────────╯ ╰──────────────╯ ╰──────────────╯ ╰──────────────────╯
└──────────────────────── Repo2RLEnv ────────────────────────┘
Quickstart
# Install (pick one)
uv add repo2rlenv # add to a uv-managed project
uvx repo2rlenv --help # one-shot, no install
pip install repo2rlenv # classic
# Generate
repo2rlenv generate https://github.com/django/django \
--pipeline pr_mining --limit 100 --out ./datasets/django
# Validate
repo2rlenv validate ./datasets/django
# Push to the Hub
repo2rlenv push ./datasets/django --hub-repo-id myorg/django-r2e
# Evaluate any agent
repo2rlenv eval ./datasets/django --agent claude-code
# Train any model
repo2rlenv train ./datasets/django --trainer trl --model Qwen/Qwen2.5-Coder-7B
One CLI. Five verbs. No glue code.
Pipelines
The heart of Repo2RLEnv. Different ways to manufacture verifiable tasks from the same repo — pick one, pick all, write your own.
| Pipeline | What it does |
|---|---|
pr_mining |
Walks merged PRs, replays each one in an isolated sandbox, captures the test that the PR makes pass — that test becomes the reward |
mutation |
Mutates the codebase, keeps mutations that break ≥1 existing test, lets an LLM author the resulting issue |
issue_gen |
LLM proposes plausible issues from existing code, then mutates the repo to make them real and solvable |
<your_plugin> |
Drop in a third-party pipeline via Python entry points |
Every pipeline flows through the same 4-layer quality gate — environment determinism, oracle consistency, LLM-judge semantic alignment, false-negative filtering — before a task is admitted to the dataset. Junk in, junk out is the default in this space; Repo2RLEnv's QA is what stops it.
What you get out
A single dataset format that:
- Is verifiable — every task carries an executable test that produces a real reward signal
- Is reproducible — pinned image digests, deterministic verifiers, content-addressed
- Trains anywhere — TRL, SkyRL, Prime-RL, Tinker, Miles, Slime, harbor.rl
- Evaluates anywhere — Claude Code, OpenHands, Codex CLI, Gemini CLI, Mini-SWE-Agent, your own agent
- Is language-agnostic — Dockerfile + shell verifier; not Python-only
- Publishes natively to Hugging Face Hub, public or private
- Supports private repos end-to-end (auth, build secrets declared, verifier-time secrets forbidden)
Under the hood
Repo2RLEnv emits datasets in the Harbor task format — a battle-tested, language-agnostic spec with an existing ecosystem of sandboxes, agent harnesses, and training-framework integrations. By targeting Harbor we inherit its full stack: Local Docker / Modal / Daytona / E2B / Runloop sandboxes, every major coding-agent harness, parallel execution, registry CLI, and downstream hooks for OpenReward (which adds Miles, Slime, etc.). A small [metadata.repo2rlenv] extension carries provenance, image digests, and pipeline lineage.
We don't reinvent the spec — we generate the data that goes into it.
Why Repo2RLEnv
The synthesis-of-coding-tasks landscape compared:
| Repo2RLEnv | SWE-bench | SWE-Bench++ | SWE-smith | |
|---|---|---|---|---|
| Point at any repo | ✅ | ✗ (12 curated) | ✅ | ✅ |
| Real PR mining | ✅ | ✅ | ✅ | ✗ |
| Synthetic mutation | ✅ | ✗ | ✗ | ✅ |
| 4-layer QA gate | ✅ | manual | ✅ | partial |
| Polyglot from day one | ✅ | Python | Python | Python-dominant |
| Plugs into existing trainers | ✅ | ✗ | ✗ | SWE-agent only |
| HF Hub native | ✅ | partial | ✗ | ✗ |
The wedge: PR mining + synthetic mutation under one quality-gated pipeline, language-agnostic, dropping straight into the trainers and harnesses people already use.
Status
Pre-alpha. pr_mining + Local Docker + HF Hub push works end-to-end on Python repos.
Credits
Repo2RLEnv stands on shoulders:
- Harbor — the task format and runtime ecosystem we adopt
- OpenReward — ORS protocol layer above Harbor; extra trainer integrations
- SWE-bench / SWE-bench Verified — original PR-as-task formulation
- SWE-Bench++ — four-stage QA pipeline we re-implement
- SWE-smith — mutation-based synthesis
- verifiers (Prime Intellect), OpenEnv (Meta + HF) — adjacent standardization efforts
License
Apache 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file repo2rlenv-0.1.0.tar.gz.
File metadata
- Download URL: repo2rlenv-0.1.0.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
349fee843135a8ddf3a638c69d11b588d83c581459398ecdfee9b1b9350788fe
|
|
| MD5 |
30e7b6c0041474619370e95276d7b3f4
|
|
| BLAKE2b-256 |
35e09c4a71605641b0500aed1274375f7b01bc61aa69c347113f00fdb6070836
|
File details
Details for the file repo2rlenv-0.1.0-py3-none-any.whl.
File metadata
- Download URL: repo2rlenv-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
345b2c7a64e2c74678ade6bef0c5913d485e4ac4ae36293428a9cb4989aae185
|
|
| MD5 |
04d910b57038e99e7703f1642612e536
|
|
| BLAKE2b-256 |
5b9e2d0e3112fe32f1e38488da3871443ed8bf8fff1fdd36cdcc513cf3757229
|