Skip to main content

OpenAI Gym equivalent for loops — create, run, benchmark, compare, evolve

Project description

LoopGym

Run any loop. Three ways. One API.

Compile LSS 1.1 YAML into executable environments — simulate for CI, call live models for production eval, or replay LoopNet trajectories without spending a token.


CI PyPI License: MIT Python 3.12+ LSS 1.1


pip install loopgym

Quickstart · API docs · PyPI · LoopBench · Observability


🚀 The idea in one picture

flowchart TB
  SPEC["Your LSS YAML"]
  MAKE["loopgym.make(env_id)"]

  SIM["SimEnv<br/><i>deterministic · free · CI-safe</i>"]
  LIVE["LiveEnv<br/><i>real models · production eval</i>"]
  REPLAY["ReplayEnv<br/><i>LoopNet trajectories · zero API cost</i>"]

  SPEC --> MAKE
  MAKE --> SIM
  MAKE --> LIVE
  MAKE --> REPLAY

LSS declares the loop. LoopGym runs it. LoopBench scores it. Clean separation — like Gym vs. benchmark suites in reinforcement learning.


Run cost vs fidelity — and everything else you get

Pick the backend that matches your stage. SimEnv and ReplayEnv cost $0; LiveEnv uses real model spend when you need production truth.

LoopGym backend cost comparison
Benefit SimEnv / Replay LiveEnv
API spend $0 — run all night Real model cost
Determinism Fixed seeds · CI-safe Stochastic production
LoopBench ready Submit scores without keys Production eval
LoopNet replay Replay 545 trajectories offline N/A
Safety / HITL drills PerturbedSim perturbations Full stack
One API loopgym.make(env_id) — same code path Same

The unlock: develop, test, benchmark, and regress before you burn tokens in prod.

Backend API keys Best for
SimEnv No CI, LoopBench submissions, local dev
ReplayEnv No LoopNet trajectory analysis
PerturbedSim No RAG / HITL / safety perturbations
LiveEnv Yes Production eval with real LLMs

⚡ Three backends, one line of code

import loopgym as lg

env = lg.make("loopbench/code-repair-v1")
obs = env.reset(task_id="cr-001")

while not env.done:
    action = your_agent.policy(obs)
    obs, reward, done, info = env.step(action)
Backend When to use API keys?
SimEnv CI, local dev, LoopBench submissions No
LiveEnv Production eval with real LLMs OPENAI_API_KEY (pluggable)
ReplayEnv Analyze historical runs from LoopNet No

🛠️ Try it in 60 seconds

pip install loopgym

python -c "
import loopgym as lg
env = lg.make('loopbench/code-repair-v1')
obs = env.reset(task_id='cr-001')
print('task:', obs.task_id, '| step:', obs.step)
"

Full quickstart:

git clone https://github.com/KanakMalpani/LoopGym.git && cd LoopGym
pip install -e ".[dev]"
python examples/quickstart.py
pytest tests/ -q

📈 Validate and reproduce

Ran a replay or SimEnv episode? Follow REPRODUCE.md and post on Discussion #10. Export trajectories via loopnet COMMUNITY-SUBMISSION.


🗺️ Environments (v0.1.3)

Env ID Backend Stress-tests / Perturbations
loopbench/code-repair-v1 Sim Verify-driven repair, iteration limits
loopbench/research-synthesis-v1 Sim Multi-step synthesis + rubric
loopbench/multi-agent-debate-v1 Sim Role-separated workers + evaluator
loopbench/composed-swarm-v1 Sim Composed parallel rehearsal (scenario-swarm-rehearsal) — LB-COMP-1
loopbench/rag-retrieval-v1 Perturbed Sim RAG retrieval with missing/stale source perturbations — LB-RAG-1
loopbench/hitl-gate-v1 Perturbed Sim Human-in-the-loop approval gate simulation (rejections) — LB-HITL-1
loopbench/safety-constrained-v1 Perturbed Sim Tool allowlist / denylist safety termination — LB-SAFE-1
replay/loopnet-v1 Replay Full trajectories from LoopNet v0.2
sim/mock-llm-v1 Sim Generic sandbox for custom LSS specs

Bundled specs under envs/loopbench/ — validated against Loop Core Engineering in CI.


🎯 Who this is for

You want to… LoopGym gives you…
Benchmark your loop design Same env IDs LoopBench uses
Test without burning API budget SimEnv + ReplayEnv
Ship production eval pipelines LiveEnv with pluggable backends
Replay production-like runs ReplayEnv + LoopNet corpus
Trace iterations & LES loopotel LTF export

👁️ Observability

Trace loop iterations without raw chat logs (LTF 0.1):

pip install loopotel loopgym
python -c "
import loopgym as lg
from loopotel.integrations.loopgym import run_traced_episode
env = lg.make('loopbench/code-repair-v1')
result, trace = run_traced_episode(env, task_id='cr-001', seed=0, enabled=True)
print(result['success'], len(trace['spans']), 'spans')
"

Full stack walkthrough: LoopNet end-to-end tutorial.


⚙️ Ecosystem

Repo Role
Loop Core Engineering LSS / LES authority
LoopNet Trajectory corpus
LoopGym Runtime (this repo)
LoopBench Public scoreboard
loop-observability LTF traces (loopotel)

Stack map: ECOSYSTEM.md


📝 Citation

@software{loopgym2026,
  title={LoopGym: OpenAI Gym for LSS-Defined Agent Loops},
  author={Malpani, Kanak},
  year={2026},
  url={https://pypi.org/project/loopgym/}
}

MIT · v0.1.3 · Contributing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loopgym-0.1.4.tar.gz (81.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

loopgym-0.1.4-py3-none-any.whl (46.3 kB view details)

Uploaded Python 3

File details

Details for the file loopgym-0.1.4.tar.gz.

File metadata

  • Download URL: loopgym-0.1.4.tar.gz
  • Upload date:
  • Size: 81.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for loopgym-0.1.4.tar.gz
Algorithm Hash digest
SHA256 38e0601820f432de6b94b616f7e8f74c958cb8e1f793a9246854acb9b22c5ad2
MD5 c10e350f9a44db4774b72da4548627d0
BLAKE2b-256 3bba76a8e472675a27f2636b62f20150c4459c66fa20a55cbf4f1ceb16aa64cd

See more details on using hashes here.

File details

Details for the file loopgym-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: loopgym-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 46.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for loopgym-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 84481e9b8fda12a7fb290bc1df0f6966e6ca52357f7651fb67b54f5043220461
MD5 17a3b91b42c8189a0bdc50d55a1013da
BLAKE2b-256 25d404fe361a50aa17787a6671551a32aefc3c9c6d36610b1495e50b106b799a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page