Skip to main content

LoopBench — benchmark suite, metrics, submission pipeline, leaderboards

Project description

LoopBench
MLPerf for loops.

CI MIT Python 3.12+ ALS v2 3 tasks


LoopBench is the public scoreboard for Loop Engineering — fixed tasks, fixed seeds, observed LES, and a submission pipeline anyone can audit.

You bring an LSS loop spec. LoopBench runs it through LoopGym, computes LES_obs across eight categories, validates your results JSON, and ranks you on the leaderboard. No hand-waved demos.

loopbench run --task LB-CR-1 --spec your-loop.yaml --seeds 0,1,2,3,4 -o results.json
loopbench validate results.json

Run your first score → · Leaderboard · Suite architecture

LoopBench CLI demo: install, list tasks, run, validate, rank


The contract

flowchart LR
  YOU[Your LSS spec]
  LB[LoopBench<br/>tasks · scoring · schema]
  LG[LoopGym<br/>execution]
  OUT[results.json → leaderboard]

  YOU --> LB
  LB -->|env_id, seeds| LG
  LG -->|trajectories| LB
  LB --> OUT
Layer Owns Repo
Spec LSS schema, LES formulas Loop Core Engineering
Data Trajectories (optional holdout) LoopNet
Runtime env.run_episode() LoopGym
Measurement Tasks, LES_obs, submissions LoopBench

LoopBench defines and scores. LoopGym runs. Never the other way around.


⚡ Run your first score

pip install git+https://github.com/KanakMalpani/LoopGym.git
pip install git+https://github.com/KanakMalpani/LoopBench.git

loopbench list

loopbench run \
  --task LB-CR-1 \
  --spec submissions/examples/spec-fast-loop.yaml \
  --seeds 0,1,2,3,4 \
  -o results.json

loopbench validate results.json
loopbench rank leaderboard/entries.json

Local dev (sibling clones):

git clone https://github.com/KanakMalpani/LoopGym.git
git clone https://github.com/KanakMalpani/LoopBench.git
cd LoopBench && pip install -e ../LoopGym -e ".[dev]"

On Windows: py -3.12 if needed. PyPI: PUBLISHING.md.


Tasks (v0.1 · ALS v2)

ID Name Env What it stress-tests
LB-CR-1 Code repair loopbench/code-repair-v1 Effectiveness, speed, robustness
LB-RS-1 Research synthesis loopbench/research-synthesis-v1 Effectiveness, cost
LB-MA-1 Multi-agent debate loopbench/multi-agent-debate-v1 Autonomy, scalability

Each task ships YAML + README under tasks/. Five seeds by default. Success@k + LES_obs composite.


Metrics

Metric Meaning
Success@k Fraction of instances reaching goal threshold g_target
LES_obs Observed eight-category composite ∈ [0, 1] — see metrics/les-compute.md
Cost Estimated USD per run from LSS cost limits
Robustness Quality retention across seeds

Display scale 0–100 is optional (les_display = les_observed × 100).


Submit to the leaderboard

  1. Run all tasks (or start with one):
    loopbench run --task LB-CR-1,LB-RS-1,LB-MA-1 --spec your-loop.yaml -o results.json
  2. Validate: loopbench validate results.json
  3. Open a PR adding your entry to leaderboard/entries.json

v0.1 rankings accept SimEnv submissions only (no API keys, fully reproducible). LiveEnv tier: v0.2.


Repository layout

Path Purpose
tasks/ ALS v2 task definitions
metrics/les-compute.md LES_obs formulas
submit/schema.json Submission JSON schema
loopbench/ Runner, LES compute, conformance
leaderboard/ Public rankings (JSON v0.1)
submissions/examples/ Reference specs

Citation

@software{loopbench2026,
  title={LoopBench: Benchmark Suite for Loop Engineering},
  author={Malpani, Kanak},
  year={2026},
  url={https://github.com/KanakMalpani/LoopBench}
}

MIT · v0.1 · Contributing · Security · Status

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loopbench-0.1.0.tar.gz (86.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

loopbench-0.1.0-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file loopbench-0.1.0.tar.gz.

File metadata

  • Download URL: loopbench-0.1.0.tar.gz
  • Upload date:
  • Size: 86.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for loopbench-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a2ade97a06fcbaa31e008394169c20adc3ddd02b23ed240f1f7fdbb16db1337c
MD5 6c1f93c880f513aceb3ad7694481b1f7
BLAKE2b-256 3a5b8767bab10e4b8e741e1bcfc36019d2e9afab852dfc6a90a708c6c5256be2

See more details on using hashes here.

File details

Details for the file loopbench-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: loopbench-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for loopbench-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 49b1bf795cc661e03d5efc3bc1de6e626d33302115c81178c7ac3e58ad8bfc94
MD5 195359dda33542bef0ed4f9c506a4cd4
BLAKE2b-256 87a280cbc8e38ddfb4451b6c7a4bd9382318f49f30e541de412094ee0a79ec4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page