loopbench

LoopBench — benchmark suite, metrics, submission pipeline, leaderboards

These details have not been verified by PyPI

Project links

Project description

LoopBench
MLPerf for loops.

Python 3.12+ 3 tasks

LoopBench is the public scoreboard for Loop Engineering — fixed tasks, fixed seeds, observed LES, and a submission pipeline anyone can audit.

You bring an LSS loop spec. LoopBench runs it through LoopGym, computes LES_obs across eight categories, validates your results JSON, and ranks you on the leaderboard. No hand-waved demos.

loopbench run --task LB-CR-1 --spec your-loop.yaml --seeds 0,1,2,3,4 -o results.json
loopbench validate results.json

Run your first score → · Leaderboard · Suite architecture

LoopBench CLI demo: install, list tasks, run, validate, rank

The contract

flowchart LR
  YOU[Your LSS spec]
  LB[LoopBench<br/>tasks · scoring · schema]
  LG[LoopGym<br/>execution]
  OUT[results.json → leaderboard]

  YOU --> LB
  LB -->|env_id, seeds| LG
  LG -->|trajectories| LB
  LB --> OUT

Layer	Owns	Repo
Spec	LSS schema, LES formulas	Loop Core Engineering
Data	Trajectories (optional holdout)	LoopNet
Runtime	`env.run_episode()`	LoopGym
Measurement	Tasks, LES_obs, submissions	LoopBench

LoopBench defines and scores. LoopGym runs. Never the other way around.

⚡ Run your first score

pip install git+https://github.com/KanakMalpani/LoopGym.git
pip install git+https://github.com/KanakMalpani/LoopBench.git

loopbench list

loopbench run \
  --task LB-CR-1 \
  --spec submissions/examples/spec-fast-loop.yaml \
  --seeds 0,1,2,3,4 \
  -o results.json

loopbench validate results.json
loopbench rank leaderboard/entries.json

Local dev (sibling clones):

git clone https://github.com/KanakMalpani/LoopGym.git
git clone https://github.com/KanakMalpani/LoopBench.git
cd LoopBench && pip install -e ../LoopGym -e ".[dev]"

On Windows: py -3.12 if needed. PyPI: PUBLISHING.md.

Tasks (v0.1 · ALS v2)

ID	Name	Env	What it stress-tests
`LB-CR-1`	Code repair	`loopbench/code-repair-v1`	Effectiveness, speed, robustness
`LB-RS-1`	Research synthesis	`loopbench/research-synthesis-v1`	Effectiveness, cost
`LB-MA-1`	Multi-agent debate	`loopbench/multi-agent-debate-v1`	Autonomy, scalability

Each task ships YAML + README under tasks/. Five seeds by default. Success@k + LES_obs composite.

Metrics

Metric	Meaning
Success@k	Fraction of instances reaching goal threshold `g_target`
LES_obs	Observed eight-category composite ∈ `[0, 1]` — see `metrics/les-compute.md`
Cost	Estimated USD per run from LSS cost limits
Robustness	Quality retention across seeds

Display scale 0–100 is optional (les_display = les_observed × 100).

Submit to the leaderboard

Run all tasks (or start with one):
loopbench run --task LB-CR-1,LB-RS-1,LB-MA-1 --spec your-loop.yaml -o results.json
Validate: loopbench validate results.json
Open a PR adding your entry to leaderboard/entries.json

v0.1 rankings accept SimEnv submissions only (no API keys, fully reproducible). LiveEnv tier: v0.2.

Repository layout

Path	Purpose
`tasks/`	ALS v2 task definitions
`metrics/les-compute.md`	LES_obs formulas
`submit/schema.json`	Submission JSON schema
`loopbench/`	Runner, LES compute, conformance
`leaderboard/`	Public rankings (JSON v0.1)
`submissions/examples/`	Reference specs

Citation

@software{loopbench2026,
  title={LoopBench: Benchmark Suite for Loop Engineering},
  author={Malpani, Kanak},
  year={2026},
  url={https://github.com/KanakMalpani/LoopBench}
}

_{MIT · v0.1 · Contributing · Security · Status}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loopbench-0.1.0.tar.gz (86.2 kB view details)

Uploaded Jun 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

loopbench-0.1.0-py3-none-any.whl (11.9 kB view details)

Uploaded Jun 13, 2026 Python 3

File details

Details for the file loopbench-0.1.0.tar.gz.

File metadata

Download URL: loopbench-0.1.0.tar.gz
Upload date: Jun 13, 2026
Size: 86.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for loopbench-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a2ade97a06fcbaa31e008394169c20adc3ddd02b23ed240f1f7fdbb16db1337c`
MD5	`6c1f93c880f513aceb3ad7694481b1f7`
BLAKE2b-256	`3a5b8767bab10e4b8e741e1bcfc36019d2e9afab852dfc6a90a708c6c5256be2`

See more details on using hashes here.

File details

Details for the file loopbench-0.1.0-py3-none-any.whl.

File metadata

Download URL: loopbench-0.1.0-py3-none-any.whl
Upload date: Jun 13, 2026
Size: 11.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for loopbench-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`49b1bf795cc661e03d5efc3bc1de6e626d33302115c81178c7ac3e58ad8bfc94`
MD5	`195359dda33542bef0ed4f9c506a4cd4`
BLAKE2b-256	`87a280cbc8e38ddfb4451b6c7a4bd9382318f49f30e541de412094ee0a79ec4e`

See more details on using hashes here.

loopbench 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

The contract

⚡ Run your first score

Tasks (v0.1 · ALS v2)

Metrics

Submit to the leaderboard

Repository layout

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes