Skip to main content

AlphaEvolve with fuzzy evaluation. Evolve anything, not just code.

Project description

fuzzyevolve

Evolve text with LLM mutation + LLM judging, using TrueSkill for noisy multi-metric feedback and a fixed-size, crowding-based population for diversity.

Inspired by AlphaEvolve, but designed for “fuzzy” criteria like prose, coherence, originality, funny, interesting, etc.

Quick start

export GOOGLE_API_KEY=... # default config uses google-gla:* models
uv sync

# Uses ./config.toml if present (or defaults)
uv run fuzzyevolve "This is my starting prompt."

Input can be a string, a file path, or stdin:

uv run fuzzyevolve seed.txt
cat seed.txt | uv run fuzzyevolve

Output goes to best.md by default (override with --output). By default it includes the top 20 individuals by fitness (override with --top).

By default, each run is recorded under .fuzzyevolve/runs/<run_id>/ (checkpoints, events, and raw LLM prompts/outputs). Resume with:

uv run fuzzyevolve --resume .fuzzyevolve/runs/<run_id> --iterations 100

Browse runs in the TUI:

uv run fuzzyevolve tui
# or open a specific run/checkpoint:
uv run fuzzyevolve tui --run .fuzzyevolve/runs/<run_id>

Disable recording with --no-store.

Note: semantic embeddings require sentence-transformers. Install the extra or use the built-in hash embedding:

uv sync --extra semantic

What it does

  • Critiques the selected parent once per iteration (structured: preserve / issues / rewrite routes).
  • Generates children via a set of LLM-backed mutation operators (e.g. “exploit” vs “explore” full rewrites).
  • Judges parent/children by ranking them per metric (tiered rankings, ties allowed).
  • Updates per-metric TrueSkill ratings (μ/σ) from those rankings (with uncertainty-aware scoring).
  • Keeps diversity with a fixed-size pool + crowding: repeatedly remove the weaker of the closest pair in embedding space.

Mental model

  • A text is a “player” with a TrueSkill rating per metric (e.g. one rating for prose, one for coherence).
  • The judge doesn’t assign absolute scores; it ranks candidates relative to each other for each metric.
  • The population is a fixed-size “portfolio” of texts in embedding space.
  • Each iteration is: pick a parent → critique → propose children → rank a battle → update ratings → insert children → apply crowding elimination.

How it works (core loop)

  1. Embed: compute embedding = embed(text) for parent/children (hash or semantic).
  2. Select parent: mixture policy: uniform sampling, or an optimistic tournament (μ + β·σ).
  3. Critique (optional): ask an LLM for actionable guidance (issues + distinct rewrite routes).
  4. Mutate: allocate a per-iteration job budget across operators; each job proposes one rewritten child.
  5. Assemble battle: parent + children + frozen anchors + an opponent (defaults to farthest-from-parent in the pool).
  6. Judge: ask an LLM to return tiered rankings for each metric (with validation + optional repair retries).
  7. Update ratings: apply per-metric TrueSkill updates; score uses a conservative LCB (mu - c*sigma) averaged across metrics.
  8. Crowding: add children to the pool; while size > N, remove the weaker of the closest pair in embedding space.

Configuration

Config is a single TOML/JSON file. If config.toml or config.json exists in the current directory it’s auto-detected; pass an explicit file with --config.

See config.toml for a complete example. The structure is intentionally nested:

  • [task] and [metrics] define what “good” means (goal + metric names/descriptions).
  • [mutation] defines the operator set, job budget, and per-operator uncertainty.
  • [judging] controls judge retries + optional opponents.
  • [rating] controls TrueSkill parameters and the score’s LCB constant.
  • [embeddings] defines the embedding model (hash or a sentence-transformers model name).
  • [population] defines the fixed pool size.
  • [selection] configures the parent-selection mixture policy.
  • [anchors] optionally injects frozen reference anchors (seed + periodic “ghosts”) into battles.

CLI

run is the default command, so these are equivalent:

uv run fuzzyevolve "Seed text..."
uv run fuzzyevolve run "Seed text..."

To open the run browser:

uv run fuzzyevolve tui

run options

  • --config / -c: Path to TOML/JSON config
  • --output / -o: Output path (default best.md)
  • --top: How many top individuals to include (default 20; 0 = all)
  • --iterations / -i: Override run.iterations
  • --goal / -g: Override task.goal
  • --metric / -m: Override metrics.names (repeatable)
  • --resume: Resume from a previous run directory (or checkpoint file)
  • --store/--no-store: Enable/disable recording under .fuzzyevolve/
  • --log-level / -l: Logging level (debug|info|warning|error|critical or a number)
  • --log-file: Write logs to a specific file
  • --quiet / -q: Hide the progress bar and non-essential logging

Requirements

  • Python 3.10+
  • uv (recommended)
  • Any model supported by pydantic-ai (configure via [llm].judge_model and [[llm.ensemble]].model)
  • An API key for the provider you choose
export GOOGLE_API_KEY=...     # e.g. google-gla:*
export OPENAI_API_KEY=...     # e.g. openai:*
export ANTHROPIC_API_KEY=...  # e.g. anthropic:*

Semantic embeddings require:

uv sync --extra semantic

Development

uv sync --extra dev
uv run ruff format .
uv run ruff check .
uv run pytest -q

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fuzzyevolve-0.2.1-py3-none-any.whl (49.7 kB view details)

Uploaded Python 3

File details

Details for the file fuzzyevolve-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: fuzzyevolve-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 49.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for fuzzyevolve-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 beec276c231478a3423e345914c2814f5b8e7a8a864929f90e3cf9cb42717827
MD5 cf42b3f31fdc03d3b8f3e467c2508edf
BLAKE2b-256 ff68cb6e870bf11c412bc684e7eba1d843399a587a363064ad7451e3b9e44c43

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page