Skip to main content

Genetic algorithms for autonomous code optimization

Project description

cEvolve

The LLM imagines. Evolution discovers.

Genetic algorithms for autonomous code and agent optimization. The LLM generates ideas, the GA explores which combinations work best together.

Inspired by karpathy/autoresearch.

┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│   LLM generates ideas         GA evolves combinations         Best wins     │
│   ───────────────────         ────────────────────────         ─────────    │
│                                                                             │
│   "try caching"          ┌─► [cache, batch=32]  ────► 45ms                  │
│   "batch sizes 16,32,64" │                                                  │
│   "use SIMD"             ├─► [SIMD, cache]      ────► 38ms  ◄── winner!     │
│   "reduce allocations"   │                                                  │
│                          └─► [batch=64, alloc]  ────► 52ms                  │
│                                    │                                        │
│                                    ▼                                        │
│                          crossover + mutate                                 │
│                                    │                                        │
│                                    ▼                                        │
│                          ┌─► [SIMD, cache, batch=32] ──► 35ms ◄── new best! │
│                          │                                                  │
│                          └─► ...                                            │
│                                    │                                        │
│                                    ▼                                        │
│                            ┌─────────────┐                                  │
│                            │   RETHINK   │  LLM analyzes what worked,       │
│                            │   (commit)  │  adds new ideas, removes duds    │
│                            └─────────────┘                                  │
│                                    │                                        │
│                                    ▼                                        │
│                             continue evolving...                            │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Quick Start

Requirements: Python 3.10+, uv, claude or pi CLI.

# Install
uv sync

# Run evolution (primary CLI for humans)
uv run cevolve run --target examples/sorting/train.py --metric time_ms --llm pi

How It Works

Hill-Climbing vs Evolution

Hill-climbing (greedy) tries one change at a time:

baseline → try A → better? keep : discard → try B → ...

This struggles when:

  • Ideas interact: A+B together beats A or B alone
  • Ideas conflict: A helps, B helps, but A+B hurts
  • Local optima: Greedy gets stuck
  • Time matters: Each eval waits for the previous (no parallelization)

Evolution maintains a population and evolves combinations:

Generation 0:  [A], [B], [C], [A,B], [B,C]  → evaluate all (in parallel!)
                           ↓
              Selection: keep fittest
              Crossover: [A] × [B,C] → [A,C]
              Mutation:  [A,C] → [A,C,D]
                           ↓
Generation 1:  [A,B], [A,C], [A,C,D], ...   → evaluate all (in parallel!)

Crossover discovers combinations that greedy search might never find.

Parallelization Advantage

Greedy:  eval1 → wait → eval2 → wait → eval3 → wait → ...  (sequential)
cEvolve: [eval1, eval2, eval3, eval4, eval5, eval6]       (parallel!)
             ↓
         [eval7, eval8, eval9, eval10, eval11, eval12]    (parallel!)

30 evals with 6 workers:
  Greedy: 30 × eval_time (sequential)
  cEvolve: 5 × eval_time  (6× faster wall-clock)

For expensive evaluations (LLM training, compilation, simulation), this can mean the difference of optimizations you can actually run vs those that aren't practical with hardware limitations.

The Rethink Loop

Every N evaluations, the LLM analyzes results and suggests new ideas:

┌─────────────────────────────────────────────────────────────────┐
│    ┌──────────┐     ┌──────────┐     ┌──────────┐              │
│    │  IDEAS   │ ──► │ EVOLVE   │ ──► │ RETHINK  │ ─┐           │
│    │  (genes) │     │ (N evals)│     │ (analyze)│  │           │
│    └──────────┘     └──────────┘     └──────────┘  │           │
│         ▲                                          │           │
│         │         Add new ideas, commit best       │           │
│         └──────────────────────────────────────────┘           │
└─────────────────────────────────────────────────────────────────┘

Accumulating Wins

On each rethink, the best configuration is kept and becomes the new baseline. Essentially a hill-climb optimization outer loop:

Era 1: Explore combinations → [reduce_depth] wins → commit
    ↓
Era 2: Optimize ON TOP of reduce_depth → find +5% more → commit
    ↓
Era 3: Continue stacking improvements...

This combines evolution's parallel exploration with sequential accumulation.


Results

On an agent optimization benchmark (25 tasks, 5 tunable parameters):

Metric Greedy cEvolve
Final performance 0.896 0.896
Evals to reach good solution 8.2 3.2 (61% faster)
Found max_iter+cot synergy 0/5 runs 2/5 runs

Key finding: cEvolve reaches good solutions 2.6× faster and discovers gene combinations that greedy search never finds.

See WHITEPAPER.md for full experimental details.


Usage

Primary CLI (cevolve run)

For humans using the tool directly. Built-in LLM discovers ideas and implements genes:

# Single file optimization
cevolve run --target train.py --metric val_bpb --llm claude

# Multi-file optimization
cevolve run --scope "src/**/*.py" --metric time_ms --llm pi

# Options
cevolve run --target train.py --metric time_ms --llm pi \
  --max-evals 50 \      # More evaluations
  --no-tui \            # Plain output (no TUI)
  --dry-run             # Mock LLM and training

Composable Commands (for extensions and agents)

For tools like pi-evolve that bring their own LLM/agent logic, or direct use by agents:

# Initialize session with ideas
cevolve init --name my-opt \
  --idea "use_cache: Enable caching" \
  --idea "batch_size[16,32,64]: Batch size" \
  --bench "./bench.sh" \
  --metric time_ms

# Get next individual (genes to implement)
cevolve next --json
# {"individual_id": "ind-abc123", "genes": {"use_cache": "on", "batch_size": "32"}, ...}

# Extension implements genes using its own tools...

# Run benchmark, record result, revert changes
cevolve eval --id ind-abc123

# Or if extension runs benchmark itself:
cevolve record --id ind-abc123 --fitness 42.5 --metrics '{"memory_mb": 128}'
cevolve revert

# Analyze and modify ideas
cevolve rethink --add-idea "use_mmap: Memory-map files"

# Check status
cevolve status

# Finalize
cevolve stop

Key difference: cevolve run has built-in LLM. Composable commands are LLM-agnostic.

run Options

Flag Default Description
--target train.py File to optimize (single-file mode)
--scope - Glob patterns (multi-file mode)
--metric val_bpb Metric to optimize
--direction lower Optimization direction (lower or higher)
--llm claude LLM CLI to use (claude or pi)
--max-evals 20 Maximum evaluations
--pop-size 6 Population size
--rethink 5 Rethink every N evals (0 to disable)
--name auto Session name (default: run-YYYYMMDD-HHMMSS)
--no-tui - Disable TUI, plain output
--dry-run - Mock LLM and training

init Options

Flag Default Description
--name required Session name
--ideas - JSON file or inline array of ideas
--idea - Inline idea (repeatable)
--bench required Benchmark command
--metric time_ms Metric to optimize
--revert git Revert strategy: git, stash, cache

TUI Controls

Key Action
d Toggle details panel (shows ideas)
↑/k Scroll log up
↓/j Scroll log down
Page Up/Down Scroll 20 lines
Home/End Jump to oldest/newest

Configuration

Parameter Default Description
population_size 6 Individuals per generation
elitism 2 Top N kept each generation
mutation_rate 0.2 Gene mutation probability
crossover_rate 0.7 Crossover vs copy probability
convergence_evals 16 Stop after N evals without improvement
rethink_interval 5 Analyze progress every N evals
experiment_timeout 600 Timeout per experiment (seconds)

Output

Results saved to .cevolve/<session-name>/:

.cevolve/run-20240404-123456/
├── config.json       # Run configuration
├── ideas.json        # Ideas explored
├── population.json   # Final population state
├── history.jsonl     # All evaluations (for analysis)
├── RESULTS.md        # Human-readable summary
├── convergence.png   # Fitness over time
├── idea_analysis.png # Idea effectiveness
└── synergy_matrix.png # Idea interactions

history.jsonl

Each line is one evaluation:

{"evaluation": 1, "generation": 0, "id": "ind-123", "genes": {"depth": null, ...}, "fitness": 2.493, "metrics": {...}}

Charts

convergence.png — Fitness over evaluations with best-so-far line

idea_analysis.png — Average fitness with each idea ON vs OFF

synergy_matrix.png — Heatmap of idea combinations


Documentation

Doc Description
Whitepaper How and why cEvolve works. Algorithm details, experimental results, when to use it.
Design Technical specification of the evolutionary algorithm.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cevolve-0.1.0.tar.gz (194.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cevolve-0.1.0-py3-none-any.whl (57.8 kB view details)

Uploaded Python 3

File details

Details for the file cevolve-0.1.0.tar.gz.

File metadata

  • Download URL: cevolve-0.1.0.tar.gz
  • Upload date:
  • Size: 194.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for cevolve-0.1.0.tar.gz
Algorithm Hash digest
SHA256 47ea274af68a8d01f726e96617c57a45f5dfcd28226758bc8601b0bdf5cb681f
MD5 cd012e12c9c2400d00f54be5a648c27b
BLAKE2b-256 5d02c5ca6cf2fd4712ba6ce6b2e8075ebf2a8b2d619afce3f60f8dbb5f1bbeaf

See more details on using hashes here.

File details

Details for the file cevolve-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cevolve-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 57.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for cevolve-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 82e1379d9d3dc991bdccd27b5161302a4a9fa9d9f3e17d26cb375a61264c4784
MD5 1457e08183649895583494bd956038b3
BLAKE2b-256 158cb29d01ca97476ce6cb2c980c4745c0b34d1d7b73dd4053aebe6c3a569191

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page