Self-improvement engine for AI agents — evolve any agent autonomously using the Darwin Gödel Machine algorithm.

These details have not been verified by PyPI

Project links

Project description

darwinloop

darwinloop — self-improvement engine for AI agents

Point darwinloop at any Python agent, define what "good" looks like with benchmark tasks, and darwinloop will autonomously improve the code — iteration by iteration — using an LLM, without you writing a single patch.

Based on the Darwin Gödel Machine (Zhang et al., ICLR 2026).

Why darwinloop?

Measurable gains. The football-score-agent router went from 51% → 80% accuracy in 5 iterations — 16 more teams recognised, pronoun follow-ups fixed, competition-vs-team routing corrected. Zero manual patches.
Fully auditable. Every change is recorded as a unified diff. Every generation is preserved in an immutable JSON archive. Roll back anytime.
Works on any Python agent. uAgents, LangChain, LangGraph, raw Python — if it's a .py file, darwinloop can evolve it.

Quickstart

pip install darwinloop

from darwinloop import DarwinLoop, BenchmarkTask

tasks = [
    BenchmarkTask(id="t1", name="live_scores",
                  input="live scores now", expected="live"),
    BenchmarkTask(id="t2", name="vs_competition",
                  input="Arsenal vs Chelsea result", expected="competition"),
]

dl = DarwinLoop(target="my_agent/router.py", tasks=tasks, model="asi1")
result = dl.run(iterations=5)

print(f"Score: {result.base_score:.2f} → {result.best_score:.2f} (+{result.score_delta:.2f})")
result.apply()            # write best version back to router.py
result.save_report()      # save darwinloop_report.md

Expected output:

darwinloop  — self-improvement engine for AI agents

Evaluating base agent (router.py)…
  ✓ agent_0000: score=0.51 (5/10 tasks passed)

── ITERATION 1/5 ──────────────────────────────────────────
  Selected parents: ['agent_0000']
  Proposal: Add pronoun (they/them/their) follow-up handling using ctx.last_team
    Evaluating… score=0.60
  agent_0000 → agent_0001  Score: 0.51 → 0.60 (+0.09)

[… 4 more iterations …]

Evolution complete!  Score: 0.51 → 0.80  (+0.29  best: agent_0004  gen 4)

How it works

Your agent code
      │
      ▼
┌─────────────┐
│  Benchmark  │  Run tasks in isolated sandbox → score (0.0–1.0)
└──────┬──────┘
       │ failures
       ▼
┌─────────────┐
│  Diagnose   │  LLM analyses code + failures → improvement proposal
└──────┬──────┘
       │ proposal
       ▼
┌─────────────┐
│   Improve   │  LLM uses editor tools (str_replace) to apply change
└──────┬──────┘
       │ new code
       ▼
┌─────────────┐
│  Re-score   │  Run benchmarks again on new code
└──────┬──────┘
       │ score > old?
      YES → keep it (add to archive)
       NO → discard it (archive still records it for open-ended exploration)
       │
       └── repeat N iterations

LLM Support

Provider	Model	Set env var
ASI:One (default)	`asi1`	`ASI1_API_KEY`
Anthropic	`claude-3-5-sonnet-20241022`	`ANTHROPIC_API_KEY`
OpenAI	`gpt-4o`	`OPENAI_API_KEY`
Mock (free)	—	`--dry-run`

Get an ASI:One API key at asi1.ai — it's the Fetch.ai ecosystem LLM.

Benchmark Packs

Pre-built domain packs so you don't need to write tasks from scratch:

from darwinloop import DarwinLoop
from darwinloop.packs import RoutingPack, CommercePack, SupportPack

# Routing agent (intent classification)
dl = DarwinLoop(target="agent/router.py",
                pack=RoutingPack(intents=["live", "team", "competition", "fixtures"]))

# Commerce agent (product search, cart, checkout)
dl = DarwinLoop(target="agent/shop.py", pack=CommercePack())

# Customer support agent
dl = DarwinLoop(target="agent/support.py", pack=SupportPack())

CLI Reference

# Evolve a specific file
darwinloop evolve agent/router.py --iterations 5 --model asi1

# Dry run (free, no API key needed)
darwinloop evolve agent/ --dry-run --auto

# Use a built-in benchmark pack
darwinloop evolve agent/router.py --pack routing --iterations 5

# Load benchmarks from a file
darwinloop evolve agent/router.py --tasks benchmarks.py --iterations 10

# Auto-generate benchmarks from agent code
darwinloop scaffold agent/router.py --output benchmarks.py

# View a previous run report
darwinloop report darwinloop_output/

# Diff two generations
darwinloop diff darwinloop_output/ --from agent_0000 --to agent_0004

Real Example: Football Agent

The examples/football/ directory contains the real football-score-agent router and its benchmark tasks.

darwinloop evolve examples/football/football_router.py \
    --tasks examples/football/benchmarks.py \
    --iterations 5 --model asi1

DGM-discovered improvements in 5 iterations:

#	Improvement	Score impact
1	Pronoun follow-up (they/their/them → last team)	+0.09
2	+16 clubs (Juventus, Atletico, Napoli, Dortmund…)	+0.08
3	Competition-signal priority (`vs`, `result`, `score`)	+0.07
4	Fixture regex expansion (`next game`, `upcoming game`)	+0.05

Total: 0.51 → 0.80 (+0.29)

Safety

darwinloop is designed to be the most trustworthy self-improvement library available.

Guarantee	Implementation
AST validation before execution	`sandbox/validator.py` blocks `eval`, `exec`, `shell=True`
Subprocess isolation	All agent code runs in a child process, never in the darwinloop process
Hard timeouts	Sandbox default 30s, configurable via `sandbox_timeout`
No network in sandbox	Network imports trigger warnings; calls fail at runtime
Immutable archive	`AgentEntry` records are never modified after creation
Diff transparency	Every change recorded as unified diff
Revert anytime	All generations preserved; load archive and roll back
Dry run mode	`MockLLMClient` tests full pipeline at zero cost
Score regression protection	New code kept only if score strictly improves
Human checkpoints	In non-`--auto` mode, prompts before each iteration

See SECURITY.md for full details.

Contributing

See CONTRIBUTING.md. PRs welcome.

License

Apache 2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

darwinloop-0.1.0.tar.gz (49.8 kB view details)

Uploaded Jun 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

darwinloop-0.1.0-py3-none-any.whl (48.5 kB view details)

Uploaded Jun 29, 2026 Python 3

File details

Details for the file darwinloop-0.1.0.tar.gz.

File metadata

Download URL: darwinloop-0.1.0.tar.gz
Upload date: Jun 29, 2026
Size: 49.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for darwinloop-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`22f5dde8e931c115f61bc96e3410d70dcdad38a46e0fc855d59aa30f5cd293d1`
MD5	`0c7aa44e3400ea652eb0762df0466a24`
BLAKE2b-256	`ab933486c952533de0008c9920b06c681eef434077da24afe9309193ad5c75c1`

See more details on using hashes here.

File details

Details for the file darwinloop-0.1.0-py3-none-any.whl.

File metadata

Download URL: darwinloop-0.1.0-py3-none-any.whl
Upload date: Jun 29, 2026
Size: 48.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for darwinloop-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5d6dbc1b6bb27cb1659ed9a65b260a8c3d0324100bd8cfcbb357020fe2620446`
MD5	`24d3c11883b229d8f3c8e7de818dfe2e`
BLAKE2b-256	`3dcb622e1e7e47bf5b583c0beabbad5ad6a384d5555d628fea7bfa8208c74a8b`

See more details on using hashes here.

darwinloop 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

darwinloop

Why darwinloop?

Quickstart

How it works

LLM Support

Benchmark Packs

CLI Reference

Real Example: Football Agent

Safety

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes