Skip to main content

Recursive Evolutionary Program Search — LLM-driven evolutionary code search

Project description

REPS

A self-improving evolutionary code search agent that reflects, diversifies, and steers.

Circle Packing Score Python 3.12

REPS evolves programs with an LLM-driven loop that reflects between batches, balances explorer/exploiter workers, detects convergence, and steers compute by distance to a known target.

Result: Circle Packing n=26

System sum_radii Iterations Model
Prior SOTA 2.634
OpenEvolve (shipped best) 2.6342924 470 gemini-2.0-flash + claude-3.7-sonnet
AlphaEvolve (paper) 2.6358628 Gemini 2.0 Pro
FICO Xpress Solver 2.6359155
REPS 2.6359831 100 claude-sonnet-4.6

Verified against DeepMind's official validator.

REPS Circle Packing

What REPS does

  • Adaptive selectionselection_strategy="map_elites" | "pareto" | "mixed" with pareto_fraction for blending MAP-Elites bins and per-instance Pareto fronts. (reps/api/optimizer.py:64, GEPA Phase 2)
  • Trace reflectiontrace_reflection=True: the reflection LLM sees per-instance scores + feedback from the parent's failures, not just aggregate scores. (reps/api/optimizer.py:66, GEPA Phase 3)
  • Ancestry-aware reflectionlineage_depth=N: extends reflection with the last N parents in a candidate's chain. (reps/api/optimizer.py:67, GEPA Phase 5)
  • System-aware mergemerge=True: candidates from different islands recombine via an LLM-driven merge prompt that targets disjoint instance dimensions. (reps/api/optimizer.py:68, GEPA Phase 4)
  • Convergence + SOTA steering — built-in convergence monitor (edit entropy + strategy divergence) and gap-aware compute steering when a target score is set. On by default.

Status: pre-1.0

REPS is pre-1.0. The Python API (docs/python_api_spec.md) shipped recently and may still evolve. Per docs/release_spec.md, minor version bumps (0.1 → 0.2) may include breaking changes during the pre-1.0 era. Pin to a specific minor version (e.g. reps-py==0.1.*) if you need stability across upgrades. Strict semver applies once REPS reaches 1.0.0.

Install

Requires Python 3.12+ and uv.

git clone https://github.com/zkhorozianbc/reps.git
cd reps
uv venv .venv --python 3.12
uv pip install -e .

Install from PyPI with pip install reps-py. Optional extras: [dspy] (the dspy_react worker), [benchmarks] (scipy + matplotlib for the bundled circle-packing benchmark).

Set the API key matching your model's provider:

export ANTHROPIC_API_KEY=sk-ant-...      # provider: anthropic
export OPENROUTER_API_KEY=sk-or-...      # provider: openrouter
export OPENAI_API_KEY=sk-...             # provider: openai

A sibling .env file is auto-loaded.

Quick start (Python)

REPS is a Python library. Pass a seed program string and an evaluator callable; get back the best evolved program.

import reps

def evaluate(code: str) -> float:
    # Run the candidate, return a score. Higher is better.
    namespace = {}
    exec(code, namespace)
    return float(namespace["solve"]())

result = reps.Optimizer(
    model="anthropic/claude-sonnet-4.6",   # api_key from $ANTHROPIC_API_KEY
    max_iterations=20,
).optimize(
    initial=open("seed.py").read(),
    evaluate=evaluate,
)

print(result.best_score)
print(result.best_code)

What's an evaluator?

An evaluator is any Callable[[str], float | dict | reps.EvaluationResult]. REPS calls it with the candidate program text and uses the returned score to drive selection. Return a float for a quick start, a dict with combined_score and optional per_instance_scores / feedback for richer signal, or a reps.EvaluationResult to unlock the per-objective Pareto + trace-reflection paths described in docs/python_api_spec.md.

def eval_simple(code: str) -> float:    return 1.0
def eval_dict(code: str) -> dict:       return {"combined_score": 0.9, "feedback": "..."}
def eval_full(code: str) -> reps.EvaluationResult: ...

GEPA-style features (constructor knobs)

Kwarg Effect Default
selection_strategy "map_elites" (REPS classic), "pareto" (GEPA-style frontier), or "mixed" "map_elites"
pareto_fraction Blend ratio when selection_strategy="mixed" 0.0
trace_reflection Reflection sees per-instance scores + feedback, not aggregates False
lineage_depth How many ancestors the reflection prompt sees 3
merge Enable LLM-driven cross-island merge False
num_islands Population islands for diversity 5
max_iterations Search budget 100
output_dir Persist run artifacts; None ⇒ tempdir None

Full surface (escape hatches, model knobs, deferred kwargs) in docs/python_api_spec.md.

Reusing a Model

Most users pass a model-name string to Optimizer(model=...). Build a reps.Model directly when you want to call the model outside the optimizer or share one configured client across multiple runs.

import reps

model = reps.Model("anthropic/claude-sonnet-4.6", temperature=0.7)
print(model("hello"))                                    # standalone use

# Share one Model across multiple optimizers
o1 = reps.Optimizer(model=model, max_iterations=20)
o2 = reps.Optimizer(model=model, max_iterations=50, merge=True)

Power-user: CLI / YAML

For batch experiments, reproducible sweeps, or YAML-driven configuration, REPS ships a CLI: reps-run --config <yaml>. The Python API above is built on the same engine, so anything achievable via YAML is achievable via Optimizer(...) plus Optimizer.from_config(cfg).

Run

Everything lives in the YAML — point reps-run at a config and go:

reps-run --config experiment/configs/circle_sonnet_reps.yaml

Results land in experiment/results/<config-stem>/run_NNN/ (auto-versioned). The best program is saved as best_program.py; per-iteration metrics under metrics/.

Common overrides:

reps-run --config <yaml> --iterations 50 --output my_runs/
reps-run --config <yaml> -o llm.temperature=0.9 -o reps.batch_size=10

The config decides everything else — model, workers, harness (reps or openevolve), and which benchmark to evolve (via task:).

Add a benchmark

Drop two files into experiment/benchmarks/<name>/:

experiment/benchmarks/<name>/
├── initial_program.py    # seed code (wrap evolvable region in EVOLVE-BLOCK markers)
└── evaluator.py          # defines evaluate(program_path) -> {"combined_score": float, ...}

initial_program.py:

# EVOLVE-BLOCK-START
def solve():
    return naive_result
# EVOLVE-BLOCK-END

evaluator.py:

def evaluate(program_path):
    # import program_path, run it, score it
    return {"combined_score": score}

Optional files in the same directory:

  • system_prompt.md — task-specific system prompt (auto-loaded)
  • visualize.pyvisualize_from_program(path, save_path) for best-program plots

Then point a config at it:

task: ../benchmarks/<name>     # resolved relative to this YAML
max_iterations: 100
provider: anthropic
# ... see experiment/configs/circle_sonnet_reps.yaml for a full example

Run it: reps-run --config experiment/configs/<your_config>.yaml.

For cascade evaluation, also define evaluate_stage1 / evaluate_stage2. If the primary objective metric isn't combined_score, set reps.sota.target_metric: so SOTA steering compares the right value.

Configs

Reference configs in experiment/configs/:

  • circle_sonnet_reps.yaml, circle_opus47_anthropic.yaml, reps_full.yaml — full REPS runs
  • verify_*.yaml — minimal smoke tests, one per worker impl
  • circle_base.yaml, circle_sonnet_base.yamlharness: openevolve baselines (uv pip install openevolve)

reps/config.py is the source of truth for every field and default.

Tests

uv run python -m pytest tests/

Design docs

Acknowledgements

Forked from OpenEvolve; now self-contained.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reps_py-0.1.0.tar.gz (230.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reps_py-0.1.0-py3-none-any.whl (204.7 kB view details)

Uploaded Python 3

File details

Details for the file reps_py-0.1.0.tar.gz.

File metadata

  • Download URL: reps_py-0.1.0.tar.gz
  • Upload date:
  • Size: 230.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for reps_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f8a0173f15e2fa5244492cfa2a06476acae418685af0c2011048c249bf27c2f0
MD5 fdda748846f60d6c66e34b538d64ec99
BLAKE2b-256 454d8978b20864ea2ff759718f3eb5824f8bf169e4b93f49d7fc71db42945bd9

See more details on using hashes here.

File details

Details for the file reps_py-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: reps_py-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 204.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for reps_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 09e0addb4784308cf80e150b4e14c2fc31ef291f5f2e784355a11019875af418
MD5 1bd5e252501897fd41b4f45c14a4453e
BLAKE2b-256 e93f17d932ade4014227ac95f44371165fd7a460b02590802b1e3ec121282666

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page