LoopBench — benchmark suite, metrics, submission pipeline, leaderboards

These details have not been verified by PyPI

Project links

Project description

LoopBench

The public scoreboard for loop engineering.

Fixed tasks. Fixed seeds. Observed LES. Submissions anyone can audit.

No hand-waved demos — bring an LSS spec, get a number, climb the leaderboard.

pip install loopbench loopgym
loopbench list

Run your first score · Leaderboard · Suite overview

LoopBench: install, list tasks, run, validate, rank

What LoopBench measures

You submit a loop specification (LSS YAML). LoopBench:

Runs it through LoopGym on fixed task instances
Computes Success@k and LES_obs across eight categories
Validates your results.json against a published schema
Ranks you on the public leaderboard

loopbench run --task LB-CR-1 --spec your-loop.yaml --seeds 0,1,2,3,4 -o results.json
loopbench validate results.json
loopbench rank leaderboard/entries.json

The measurement stack

flowchart LR
  YOU["Your LSS spec"]
  LB["LoopBench<br/>tasks · scoring · conformance"]
  LG["LoopGym<br/>SimEnv execution"]
  OUT["results.json → leaderboard"]

  YOU --> LB
  LB --> LG
  LG --> LB
  LB --> OUT

Layer	Owns	Repo
Spec	LSS schema, LES formulas	Loop Core Engineering
Data	Trajectories (holdout v0.2)	LoopNet
Runtime	`env.run_episode()`	LoopGym
Observability	LTF traces, iteration metrics	loop-observability
Measurement	Tasks, LES_obs, anti-gaming	LoopBench

LoopBench defines and scores. LoopGym runs. Never the other way around.

New to the stack? Start with the LoopNet end-to-end tutorial.

Tasks (v0.1)

ID	Name	What it exposes
`LB-CR-1`	Code repair	Can your loop fix broken code under verify pressure?
`LB-RS-1`	Research synthesis	Quality vs. cost on structured briefs
`LB-MA-1`	Multi-agent debate	Autonomy + coordination under evaluator scrutiny
`LB-COMP-1`	Composed swarm rehearsal	Parallel branches + merge (MiroFish-style LSS)

Five seeds per task. Details in tasks/.

Validate and reproduce

Post your 60-minute reproduction report on the reproduction challenge after REPRODUCE.md.

Beat maintainer LES? Start with good-first issue #4.

Score in 2 minutes

pip install loopbench loopgym

loopbench list

loopbench run \
  --task LB-CR-1 \
  --spec submissions/examples/spec-fast-loop.yaml \
  --seeds 0,1,2,3,4 \
  -o results.json

loopbench validate results.json

Submit to the leaderboard: open a PR adding your entry to leaderboard/entries.json.

v0.1 accepts SimEnv submissions only (fully reproducible, no API keys). LiveEnv tier: v0.2.

Metrics explained

Metric	Meaning
Success@k	Fraction of instances reaching goal threshold
LES_obs	Observed composite ∈ `[0, 1]` — eight categories
Cost	Estimated USD from LSS cost limits
Robustness	Quality retention across seeds

Display scale 0–100 is optional (les × 100).

Who this is for

You are…	LoopBench gives you…
Loop designer	A number you can improve release-over-release
Framework author	A neutral arena — not your own benchmark
Researcher	Reproducible tasks + published submission schema
Team lead	Comparable scores across designs and vendors

Citation

@software{loopbench2026,
  title={LoopBench: Benchmark Suite for Loop Engineering},
  author={Malpani, Kanak},
  year={2026},
  url={https://pypi.org/project/loopbench/}
}

_{MIT · v0.1 · Contributing · Security · Status}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Jun 30, 2026

This version

0.1.1

Jun 24, 2026

0.1.0

Jun 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loopbench-0.1.1.tar.gz (88.7 kB view details)

Uploaded Jun 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

loopbench-0.1.1-py3-none-any.whl (19.1 kB view details)

Uploaded Jun 24, 2026 Python 3

File details

Details for the file loopbench-0.1.1.tar.gz.

File metadata

Download URL: loopbench-0.1.1.tar.gz
Upload date: Jun 24, 2026
Size: 88.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for loopbench-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`60a5ff993ac4767cc7792c46d21fc746d1dfdcbc5d11d9e46535abf7d2b39b3d`
MD5	`b9c6de5f050c206642732c9eada021ef`
BLAKE2b-256	`4b67175f380c877e24fde4ec462787aebbbd6e4879578f645fafa749df6bd25b`

See more details on using hashes here.

File details

Details for the file loopbench-0.1.1-py3-none-any.whl.

File metadata

Download URL: loopbench-0.1.1-py3-none-any.whl
Upload date: Jun 24, 2026
Size: 19.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for loopbench-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c35fab5d0127199b71088b559444e2a9c8d594d0490756460444ed1a7b4e4b6f`
MD5	`aa96b55a395be39e3d4a83cf868af852`
BLAKE2b-256	`6a3e3b99f28f5eadb77e9b31042353e96b1989dce6eb5d9823aead9fb4e76efe`

See more details on using hashes here.

loopbench 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LoopBench

What LoopBench measures

The measurement stack

Tasks (v0.1)

Validate and reproduce

Score in 2 minutes

Metrics explained

Who this is for

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes