Skip to main content

LoopBench — benchmark suite, metrics, submission pipeline, leaderboards

Project description

LoopBench

The public scoreboard for loop engineering.

Fixed tasks. Fixed seeds. Observed LES. Submissions anyone can audit.

No hand-waved demos — bring an LSS spec, get a number, climb the leaderboard.


CI PyPI License: MIT Tasks Suite


pip install loopbench loopgym
loopbench list

Run your first score · Leaderboard · Suite overview


LoopBench: install, list tasks, run, validate, rank

What LoopBench measures

You submit a loop specification (LSS YAML). LoopBench:

  1. Runs it through LoopGym on fixed task instances
  2. Computes Success@k and LES_obs across eight categories
  3. Validates your results.json against a published schema
  4. Ranks you on the public leaderboard
loopbench run --task LB-CR-1 --spec your-loop.yaml --seeds 0,1,2,3,4 -o results.json
loopbench validate results.json
loopbench rank leaderboard/entries.json

The measurement stack

flowchart LR
  YOU["Your LSS spec"]
  LB["LoopBench<br/>tasks · scoring · conformance"]
  LG["LoopGym<br/>SimEnv execution"]
  OUT["results.json → leaderboard"]

  YOU --> LB
  LB --> LG
  LG --> LB
  LB --> OUT
Layer Owns Repo
Spec LSS schema, LES formulas Loop Core Engineering
Data Trajectories (holdout v0.2) LoopNet
Runtime env.run_episode() LoopGym
Observability LTF traces, iteration metrics loop-observability
Measurement Tasks, LES_obs, anti-gaming LoopBench

LoopBench defines and scores. LoopGym runs. Never the other way around.

New to the stack? Start with the LoopNet end-to-end tutorial.


Tasks (v0.1)

ID Name What it exposes
LB-CR-1 Code repair Can your loop fix broken code under verify pressure?
LB-RS-1 Research synthesis Quality vs. cost on structured briefs
LB-MA-1 Multi-agent debate Autonomy + coordination under evaluator scrutiny
LB-COMP-1 Composed swarm rehearsal Parallel branches + merge (MiroFish-style LSS)

Five seeds per task. Details in tasks/.


Validate and reproduce

Post your 60-minute reproduction report on the reproduction challenge after REPRODUCE.md.

Beat maintainer LES? Start with good-first issue #4.


Score in 2 minutes

pip install loopbench loopgym

loopbench list

loopbench run \
  --task LB-CR-1 \
  --spec submissions/examples/spec-fast-loop.yaml \
  --seeds 0,1,2,3,4 \
  -o results.json

loopbench validate results.json

Submit to the leaderboard: open a PR adding your entry to leaderboard/entries.json.

v0.1 accepts SimEnv submissions only (fully reproducible, no API keys). LiveEnv tier: v0.2.


Metrics explained

Metric Meaning
Success@k Fraction of instances reaching goal threshold
LES_obs Observed composite ∈ [0, 1]eight categories
Cost Estimated USD from LSS cost limits
Robustness Quality retention across seeds

Display scale 0–100 is optional (les × 100).


Who this is for

You are… LoopBench gives you…
Loop designer A number you can improve release-over-release
Framework author A neutral arena — not your own benchmark
Researcher Reproducible tasks + published submission schema
Team lead Comparable scores across designs and vendors

Citation

@software{loopbench2026,
  title={LoopBench: Benchmark Suite for Loop Engineering},
  author={Malpani, Kanak},
  year={2026},
  url={https://pypi.org/project/loopbench/}
}

MIT · v0.1 · Contributing · Security · Status

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loopbench-0.1.1.tar.gz (88.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

loopbench-0.1.1-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file loopbench-0.1.1.tar.gz.

File metadata

  • Download URL: loopbench-0.1.1.tar.gz
  • Upload date:
  • Size: 88.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for loopbench-0.1.1.tar.gz
Algorithm Hash digest
SHA256 60a5ff993ac4767cc7792c46d21fc746d1dfdcbc5d11d9e46535abf7d2b39b3d
MD5 b9c6de5f050c206642732c9eada021ef
BLAKE2b-256 4b67175f380c877e24fde4ec462787aebbbd6e4879578f645fafa749df6bd25b

See more details on using hashes here.

File details

Details for the file loopbench-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: loopbench-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for loopbench-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c35fab5d0127199b71088b559444e2a9c8d594d0490756460444ed1a7b4e4b6f
MD5 aa96b55a395be39e3d4a83cf868af852
BLAKE2b-256 6a3e3b99f28f5eadb77e9b31042353e96b1989dce6eb5d9823aead9fb4e76efe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page