AlphaEvolve with fuzzy evaluation. Evolve anything, not just code.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

caesarnine

These details have not been verified by PyPI

Project description

fuzzyevolve

Inspired by AlphaEvolve, but designed for “fuzzy” tasks like "write a evocative sci-fi short story".

What you get in practice:

A repeatable loop that steadily improves a draft when “good” is subjective.
A population of diverse candidates (not 50 near-identical paraphrases).
A full run record you can resume, audit, and browse in a TUI.

At the end you get a diverse, high quality set of outputs for your goal. Especially fun to see what "lineages" survive and which ones get pruned.

Potential applications

This is all an experiment - but here are some things I've played with/plan on playing with:

Creative writing/short stories
Prompts for use in downstream tasks/agents
Prompts for image/video models where the judge actually generates -> evaluates the actual output
Safety/jailbreaking tests, can you find a niche/diverse set of inputs that jailbreak LLMs

Quick start

export GOOGLE_API_KEY=... # default config uses google-gla:* models
uv sync

# Uses ./config.toml if present (or defaults)
uv run fuzzyevolve "This is my starting prompt."

fuzzyevolve uses pydantic-ai for LLM calls, so it should work with Google, OpenAI, or Anthropic models (and anything else pydantic-ai supports). Configure models via [llm].judge_model and [[llm.ensemble]].model in config.toml, and set the corresponding API key env var.

Included examples

config.toml is a working example config you can start from (and fuzzyevolve will auto-detect it if it’s in your CWD).
best.md is a real output report from a run (top individuals by fitness + per-metric μ/σ).

Example config.toml switch:

[llm]
judge_model = "openai:gpt-4o-mini"

[[llm.ensemble]]
model = "openai:gpt-4o-mini"
weight = 1.0
temperature = 1.0

Input can be a string, a file path, or stdin:

uv run fuzzyevolve seed.txt
cat seed.txt | uv run fuzzyevolve

Output goes to best.md by default (override with --output). By default it includes the top 20 individuals by fitness (override with --top).

Override the goal/metrics quickly from the CLI:

uv run fuzzyevolve \
  --goal "Write a punchy, helpful README section about caching." \
  --metric clarity --metric usefulness --metric concision \
  "Draft text goes here..."

By default, each run is recorded under .fuzzyevolve/runs/<run_id>/ (checkpoints, events, and raw LLM prompts/outputs). Resume with:

uv run fuzzyevolve --resume .fuzzyevolve/runs/<run_id> --iterations 100

Browse runs in the TUI:

uv run fuzzyevolve tui
# or open a specific run/checkpoint:
uv run fuzzyevolve tui --run .fuzzyevolve/runs/<run_id>

Disable recording with --no-store.

Embeddings use sentence-transformers (installed by default). Configure the model via [embeddings].model in config.toml.

What it does (high level)

Critique: a structured critique of the current parent (preserve / issues / rewrite routes).
Mutate: multiple LLM “operators” propose children (e.g. conservative improvement vs high-variance exploration).
Judge: an LLM ranks parent/children (and optional anchors/opponent) per metric using tiered rankings (ties allowed).
Learn: per-metric TrueSkill updates convert rankings into ratings (μ/σ), then a conservative score selects “best so far”.
Stay diverse: a fixed-size population is maintained using embedding-space crowding/pruning.

Mental model (the important bits)

An “individual” is a text plus:
- an embedding (for diversity), and
- a TrueSkill rating per metric (for quality).
The judge doesn’t assign absolute scores; it ranks candidates relative to each other per metric.
The population is a fixed-size “portfolio” spread out in embedding space.
Exploration is encouraged via an optimistic parent selector (μ + β·σ), while reporting uses a conservative score (μ - c·σ).

How it works (one iteration, step by step)

Select parent from the population (mixture policy: uniform sampling or optimistic tournament).
Critique parent into reusable guidance: what to preserve, what to fix, distinct rewrite routes.
Plan mutation jobs across operators (minimums + weighted sampling).
Generate children (LLM rewrites). Exploration operators can intentionally omit the parent text to avoid “paraphrase gravity”.
Assemble a battle: parent + children (+ optional frozen anchors) (+ optional opponent from the pool).
Judge by ranking: the LLM returns tiered rankings for each metric (ties allowed; outputs are validated and optionally repaired).
Update ratings with per-metric TrueSkill, freezing anchors.
Insert children into the fixed-size pool; enforce diversity with embedding-space crowding/pruning.

Configuration

Config is a single TOML/JSON file. If config.toml or config.json exists in the current directory it’s auto-detected; pass an explicit file with --config.

See config.toml for a complete example. The structure is intentionally nested:

[task] and [metrics] define what “good” means (goal + metric names/descriptions).
[mutation] defines the operator set, job budget, and per-operator uncertainty.
[judging] controls judge retries + optional opponents.
[rating] controls TrueSkill parameters and the score’s LCB constant.
[embeddings] defines the sentence-transformers model to use for diversity.
[population] defines the fixed pool size.
[selection] configures the parent-selection mixture policy.
[anchors] optionally injects frozen reference anchors (seed + periodic “ghosts”) into battles.
[llm] chooses the judge model and the mutation ensemble.

Config Tips

Cost/latency
- Reduce [mutation].jobs_per_iteration and/or [mutation].max_children.
- Use cheaper models in [[llm.ensemble]] and/or for [llm].judge_model.
- Disable [critic].enabled if you want “mutate + judge” only.
Diversity
- Tune [embeddings].model if you want a different embedding model.
- Increase population size, or use population.pruning = "knn_local_competition" to preserve niches.
Stability
- Increase [judging].max_attempts if the judge sometimes returns invalid structure.
- Use anchors and/or opponents for better cross-population calibration.

Run data

When --store is enabled (default), each run is recorded under .fuzzyevolve/runs/<run_id>/:

checkpoints/latest.json and checkpoints/it000123.json (periodic checkpoints)
texts/<sha256>.txt (deduped text blobs)
events.jsonl (structured iteration events)
stats.jsonl (best score + pool size over time)
llm/ + llm.jsonl (raw prompts/outputs, indexed)

This is great for debugging and iteration, but it also means your prompts and model outputs are stored locally. Avoid evolving sensitive content if you don’t want it written to disk.

CLI

run is the default command, so these are equivalent:

uv run fuzzyevolve "Seed text..."
uv run fuzzyevolve run "Seed text..."

To open the run browser:

uv run fuzzyevolve tui

`run` options

--config / -c: Path to TOML/JSON config
--output / -o: Output path (default best.md)
--top: How many top individuals to include (default 20; 0 = all)
--iterations / -i: Override run.iterations
--goal / -g: Override task.goal
--metric / -m: Override metrics.names (repeatable)
--resume: Resume from a previous run directory (or checkpoint file)
--store/--no-store: Enable/disable recording under .fuzzyevolve/
--log-level / -l: Logging level (debug|info|warning|error|critical or a number)
--log-file: Write logs to a specific file
--quiet / -q: Hide the progress bar and non-essential logging

Requirements

Python 3.10+
uv (recommended)
Any model supported by pydantic-ai (Google/OpenAI/Anthropic all work; configure via [llm].judge_model and [[llm.ensemble]].model)
An API key for the provider you choose

export GOOGLE_API_KEY=...     # e.g. google-gla:*
export OPENAI_API_KEY=...     # e.g. openai:*
export ANTHROPIC_API_KEY=...  # e.g. anthropic:*

Troubleshooting

ImportError: sentence-transformers is required
- Run uv sync (or pip install sentence-transformers).
Judge returns invalid rankings / retries fail
- Increase [judging].max_attempts, or switch to a more reliable judge model.
Runs are expensive
- Start with fewer metrics, fewer mutation jobs, and a smaller population. Then scale up.
Resume isn’t picking up where you expect
- Point --resume at a run directory (or a checkpoint file). The latest checkpoint is checkpoints/latest.json.

Development

uv sync --extra dev
uv run ruff format .
uv run ruff check .
uv run pytest -q

License

Apache 2.0 — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

caesarnine

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.2

Jan 25, 2026

0.2.1

Jan 21, 2026

0.2.0

Jan 21, 2026

0.1.1

Jan 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fuzzyevolve-0.2.2-py3-none-any.whl (52.7 kB view details)

Uploaded Jan 25, 2026 Python 3

File details

Details for the file fuzzyevolve-0.2.2-py3-none-any.whl.

File metadata

Download URL: fuzzyevolve-0.2.2-py3-none-any.whl
Upload date: Jan 25, 2026
Size: 52.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for fuzzyevolve-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8d9546f11d7d6fa1605d47cb8a8956e000f38bd276caf07e8e15b329996d2958`
MD5	`1e2279fe7fbfc62cc2db52aa4868a7fa`
BLAKE2b-256	`f46b0b53d6f9331203c7bb20bd22dafd6564027abe5f15776fd7e9083c1ea059`

See more details on using hashes here.

fuzzyevolve 0.2.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

fuzzyevolve

Potential applications

Quick start

Included examples

What it does (high level)

Mental model (the important bits)

How it works (one iteration, step by step)

Configuration

Config Tips

Run data

CLI

run options

Requirements

Troubleshooting

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

`run` options