LoopBench — benchmark suite, metrics, submission pipeline, leaderboards
Project description
LoopBench
MLPerf for loops.
LoopBench is the public scoreboard for Loop Engineering — fixed tasks, fixed seeds, observed LES, and a submission pipeline anyone can audit.
You bring an LSS loop spec. LoopBench runs it through LoopGym, computes LES_obs across eight categories, validates your results JSON, and ranks you on the leaderboard. No hand-waved demos.
loopbench run --task LB-CR-1 --spec your-loop.yaml --seeds 0,1,2,3,4 -o results.json
loopbench validate results.json
Run your first score → · Leaderboard · Suite architecture
The contract
flowchart LR
YOU[Your LSS spec]
LB[LoopBench<br/>tasks · scoring · schema]
LG[LoopGym<br/>execution]
OUT[results.json → leaderboard]
YOU --> LB
LB -->|env_id, seeds| LG
LG -->|trajectories| LB
LB --> OUT
| Layer | Owns | Repo |
|---|---|---|
| Spec | LSS schema, LES formulas | Loop Core Engineering |
| Data | Trajectories (optional holdout) | LoopNet |
| Runtime | env.run_episode() |
LoopGym |
| Measurement | Tasks, LES_obs, submissions | LoopBench |
LoopBench defines and scores. LoopGym runs. Never the other way around.
⚡ Run your first score
pip install git+https://github.com/KanakMalpani/LoopGym.git
pip install git+https://github.com/KanakMalpani/LoopBench.git
loopbench list
loopbench run \
--task LB-CR-1 \
--spec submissions/examples/spec-fast-loop.yaml \
--seeds 0,1,2,3,4 \
-o results.json
loopbench validate results.json
loopbench rank leaderboard/entries.json
Local dev (sibling clones):
git clone https://github.com/KanakMalpani/LoopGym.git
git clone https://github.com/KanakMalpani/LoopBench.git
cd LoopBench && pip install -e ../LoopGym -e ".[dev]"
On Windows: py -3.12 if needed. PyPI: PUBLISHING.md.
Tasks (v0.1 · ALS v2)
| ID | Name | Env | What it stress-tests |
|---|---|---|---|
LB-CR-1 |
Code repair | loopbench/code-repair-v1 |
Effectiveness, speed, robustness |
LB-RS-1 |
Research synthesis | loopbench/research-synthesis-v1 |
Effectiveness, cost |
LB-MA-1 |
Multi-agent debate | loopbench/multi-agent-debate-v1 |
Autonomy, scalability |
Each task ships YAML + README under tasks/. Five seeds by default. Success@k + LES_obs composite.
Metrics
| Metric | Meaning |
|---|---|
| Success@k | Fraction of instances reaching goal threshold g_target |
| LES_obs | Observed eight-category composite ∈ [0, 1] — see metrics/les-compute.md |
| Cost | Estimated USD per run from LSS cost limits |
| Robustness | Quality retention across seeds |
Display scale 0–100 is optional (les_display = les_observed × 100).
Submit to the leaderboard
- Run all tasks (or start with one):
loopbench run --task LB-CR-1,LB-RS-1,LB-MA-1 --spec your-loop.yaml -o results.json - Validate:
loopbench validate results.json - Open a PR adding your entry to
leaderboard/entries.json
v0.1 rankings accept SimEnv submissions only (no API keys, fully reproducible). LiveEnv tier: v0.2.
Repository layout
| Path | Purpose |
|---|---|
tasks/ |
ALS v2 task definitions |
metrics/les-compute.md |
LES_obs formulas |
submit/schema.json |
Submission JSON schema |
loopbench/ |
Runner, LES compute, conformance |
leaderboard/ |
Public rankings (JSON v0.1) |
submissions/examples/ |
Reference specs |
Citation
@software{loopbench2026,
title={LoopBench: Benchmark Suite for Loop Engineering},
author={Malpani, Kanak},
year={2026},
url={https://github.com/KanakMalpani/LoopBench}
}
MIT · v0.1 · Contributing · Security · Status
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file loopbench-0.1.0.tar.gz.
File metadata
- Download URL: loopbench-0.1.0.tar.gz
- Upload date:
- Size: 86.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2ade97a06fcbaa31e008394169c20adc3ddd02b23ed240f1f7fdbb16db1337c
|
|
| MD5 |
6c1f93c880f513aceb3ad7694481b1f7
|
|
| BLAKE2b-256 |
3a5b8767bab10e4b8e741e1bcfc36019d2e9afab852dfc6a90a708c6c5256be2
|
File details
Details for the file loopbench-0.1.0-py3-none-any.whl.
File metadata
- Download URL: loopbench-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49b1bf795cc661e03d5efc3bc1de6e626d33302115c81178c7ac3e58ad8bfc94
|
|
| MD5 |
195359dda33542bef0ed4f9c506a4cd4
|
|
| BLAKE2b-256 |
87a280cbc8e38ddfb4451b6c7a4bd9382318f49f30e541de412094ee0a79ec4e
|