Minimal open-source AlphaEvolve: LLM-driven program evolution with MAP-Elites islands, cascade evaluation, and a local Ollama ensemble.

These details have not been verified by PyPI

Project links

Project description

fastevolve

Minimal open-source AlphaEvolve: LLM-driven program evolution with MAP-Elites islands, cascade evaluation, and a local Ollama ensemble.

Install

1. Install uv (one-time)

uv is a fast Python package manager. Pick the line for your OS:

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Or via Homebrew (brew install uv), pipx (pipx install uv), or pip (pip install uv).

2. Add fastevolve to a new project

uv init my-evolve-project
cd my-evolve-project
uv add fastevolve

OpenAI and Anthropic SDKs are optional extras — install whichever you need:

uv add "fastevolve[openai]"       # adds the OpenAI SDK
uv add "fastevolve[anthropic]"    # adds the Anthropic SDK
uv add "fastevolve[all]"          # both

If you only use Ollama, skip the extras — neither SDK will be imported.

3. Or clone this repo and sync

git clone https://github.com/tiagomonteiro0715/fastevolve.git
cd fastevolve
uv sync                       # core
uv sync --extra all           # core + OpenAI + Anthropic

Quick start in code

Local (with Ollama)

Assumes ollama serve is running and you've pulled the model.

The ensemble below mixes a fast model (low temperature, conservative) with a deep one (higher temperature, more exploration). The adaptive router learns over time when to call each. Evaluation is wrapped in run_sandboxed, which kills any program that loops forever or crashes — that iteration just scores zero instead of hanging the run.

from fastevolve import Config, Controller, run_sandboxed
from fastevolve.llm_ensemble import ModelConfig

INITIAL = "def solve(x):\n    return x\n"

def correctness(p):
    cases = [(2, 4), (3, 9), (4, 16), (5, 25)]
    return sum(1 for x, y in cases
               if run_sandboxed(p.code, "solve", x, timeout=2.0) == y) / len(cases)

cfg = Config()
cfg.iterations = 20
cfg.checkpoint_path = "run.log"   # optional — resume if killed mid-run
cfg.ensemble.models = [
    ModelConfig(name="gemma4:e2b", provider="ollama", temperature=0.4, weight=1.0, role="fast"),
    ModelConfig(name="gemma4:e2b", provider="ollama", temperature=0.9, weight=0.5, role="deep"),
]
cfg.evaluator.cascade = [(correctness, 0.0)]

result = Controller(cfg, initial_program=INITIAL).run()
print(result.best.code)

Google Colab (with OpenAI or Anthropic)

Ollama isn't practical on Colab — use an API provider instead. Paste this into a Colab cell:

!pip install -q "fastevolve[openai]"

import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")  # store in Colab Secrets first

from fastevolve import Config, Controller, run_sandboxed
from fastevolve.llm_ensemble import ModelConfig

INITIAL = "def solve(x):\n    return x\n"

def correctness(p):
    cases = [(2, 4), (3, 9), (4, 16), (5, 25)]
    return sum(1 for x, y in cases
               if run_sandboxed(p.code, "solve", x, timeout=2.0) == y) / len(cases)

cfg = Config()
cfg.iterations = 20
cfg.ensemble.models = [
    ModelConfig(name="gpt-4o-mini", provider="openai", temperature=0.4, weight=1.0, role="fast"),
    ModelConfig(name="gpt-4o",      provider="openai", temperature=0.7, weight=0.3, role="deep"),
]
cfg.evaluator.cascade = [(correctness, 0.0)]

result = Controller(cfg, initial_program=INITIAL).run()
print(result.best.code)

Google Colab (with Claude)

!pip install -q "fastevolve[anthropic]"

import os
from google.colab import userdata
os.environ["ANTHROPIC_API_KEY"] = userdata.get("ANTHROPIC_API_KEY")  # store in Colab Secrets first

from fastevolve import Config, Controller, run_sandboxed
from fastevolve.llm_ensemble import ModelConfig

INITIAL = "def solve(x):\n    return x\n"

def correctness(p):
    cases = [(2, 4), (3, 9), (4, 16), (5, 25)]
    return sum(1 for x, y in cases
               if run_sandboxed(p.code, "solve", x, timeout=2.0) == y) / len(cases)

cfg = Config()
cfg.iterations = 20
cfg.ensemble.models = [
    ModelConfig(name="claude-haiku-4-5-20251001", provider="anthropic",
                temperature=0.4, weight=1.0, role="fast"),
    ModelConfig(name="claude-opus-4-7",          provider="anthropic",
                temperature=0.7, weight=0.3, role="deep"),
]
cfg.evaluator.cascade = [(correctness, 0.0)]

result = Controller(cfg, initial_program=INITIAL).run()
print(result.best.code)

Google Colab (with Ollama)

Ollama can run on Colab if you install it, start the daemon in the background, and pull a model. Tested working on the free CPU runtime with a tiny model (qwen2.5:0.5b).

On Colab Pro / Pro+: switch to an A100 or L4 GPU runtime (Runtime → Change runtime type → A100 GPU) and swap the model for something bigger — qwen2.5-coder:7b, llama3.1:8b, or gemma2:9b all fit comfortably and produce dramatically better evolution candidates than 0.5b. Pro+'s longer sessions (24 h) and background execution also mean you can leave a 1000-iteration run going overnight without keeping the tab open.

# 1. Install ollama (zstd is required by the install script) and fastevolve via uv
!apt-get -qq install -y zstd pciutils lshw
!curl -fsSL https://ollama.com/install.sh | sh
!pip install uv
!uv pip install -q fastevolve

# 2. Run fastevolve — it starts the ollama daemon automatically with GPU-aware
#    optimizations (flash attention, q8_0 KV cache, parallel decoding) when a GPU is detected.
from fastevolve import Config, Controller, run_sandboxed
from fastevolve.llm_ensemble import ModelConfig

INITIAL = "def solve(x):\n    return x\n"

def correctness(p):
    cases = [(2, 4), (3, 9), (4, 16), (5, 25)]
    return sum(1 for x, y in cases
               if run_sandboxed(p.code, "solve", x, timeout=2.0) == y) / len(cases)

cfg = Config()
cfg.iterations = 20
cfg.ensemble.models = [
    # free CPU runtime: only the small fast model
    ModelConfig(name="qwen2.5:0.5b",       provider="ollama",
                temperature=0.4, weight=1.0, role="fast"),
    # Pro / Pro+ A100 or L4: add a stronger deep model — the router will escalate when needed
    ModelConfig(name="qwen2.5-coder:7b",   provider="ollama",
                temperature=0.7, weight=0.5, role="deep"),
]
cfg.evaluator.cascade = [(correctness, 0.0)]

result = Controller(cfg, initial_program=INITIAL).run()
print(result.best.code)

Colab sessions are disconnected after ~90 min idle and the VM is wiped — set cfg.checkpoint_path = "/content/drive/MyDrive/run.log" after mounting Drive if you want resume across sessions.

Google Colab (with vLLM)

vLLM is a high-throughput inference engine with continuous batching and paged attention — significantly faster than Ollama for sustained generation and multi-GPU setups. Use it when you have an A100 / L4 / H100 and care about throughput.

CUDA limitation (important). vLLM ships pre-built wheels tied to specific CUDA versions and it is impossible to make a single pin work for every GPU. fastevolve[vllm] installs the PyPI default, which targets CUDA 12.1 — this works on Colab Pro+ A100/L4 runtimes (currently CUDA 12.x) and most modern cloud GPUs. If your driver is on a different CUDA version, install vLLM yourself first with the matching wheel:

# Pick the line that matches your driver's CUDA version
pip install vllm --extra-index-url https://download.pytorch.org/whl/cu118    # CUDA 11.8
pip install vllm --extra-index-url https://download.pytorch.org/whl/cu124    # CUDA 12.4
# then:
uv add "fastevolve[vllm]"

vLLM is Linux-only (or Linux via WSL2). It does not work on Windows or macOS natively.

The library will log the detected vLLM version, driver CUDA, and GPU at startup so any mismatch surfaces immediately.

The ensemble below runs two different vLLM servers on the same GPU: a small, fast Qwen for cheap exploration and a bigger Gemma for hard cases. The adaptive router learns when to escalate. Each vLLM server is told to use ~45 % of GPU memory so they coexist on a single A100 40 GB.

# 1. Pick an A100 / L4 / H100 GPU runtime: Runtime → Change runtime type → A100 GPU
# 2. Install fastevolve with the vLLM extra (works on Colab's CUDA 12.x out of the box)
!pip install -q "fastevolve[vllm]"

# 3. Start two vLLM OpenAI-compatible servers on different ports.
#    `gpu_memory_utilization=0.45` lets both fit on one 40 GB A100.
#    `verbose=True` streams vLLM's own logs — useful on the first run because
#    it downloads model weights from Hugging Face (5–30 min for big models).
#    Drop `verbose=True` later to silence the noise once weights are cached.
from fastevolve.llm_ensemble import start_vllm
start_vllm("Qwen/Qwen2.5-Coder-1.5B-Instruct", port=8000,
           gpu_memory_utilization=0.45, verbose=True)
start_vllm("google/gemma-2-9b-it",             port=8001,
           gpu_memory_utilization=0.45, verbose=True)

# 4. Run fastevolve — each ModelConfig points at one of the local servers via base_url
from fastevolve import Config, Controller, run_sandboxed
from fastevolve.llm_ensemble import ModelConfig

# Task: parse a Roman numeral into an integer.
# The seed handles pure addition (e.g. "III" = 3, "LVIII" = 58) but misses the
# subtractive notation (e.g. "IV" = 4, "MCMXCIV" = 1994). The LLM has to spot
# and add the subtraction rule. Harder than `x*x` — needs real reasoning, not
# just curve-fitting to a few input/output pairs.
INITIAL = '''def solve(s):
    vals = {"I": 1, "V": 5, "X": 10, "L": 50, "C": 100, "D": 500, "M": 1000}
    return sum(vals[c] for c in s)
'''

CASES = [
    ("III", 3), ("VIII", 8), ("LVIII", 58),         # pure addition — seed already passes
    ("IV", 4), ("IX", 9), ("XL", 40), ("XC", 90),   # simple subtraction
    ("CDXLIV", 444), ("MCMXCIV", 1994),             # nested subtraction
    ("MMXXIV", 2024), ("MMMCMXCIX", 3999),          # long-form
]

def correctness(p):
    return sum(1 for s, y in CASES
               if run_sandboxed(p.code, "solve", s, timeout=2.0) == y) / len(CASES)

cfg = Config()
cfg.iterations = 50
cfg.ensemble.models = [
    # fast: small Qwen, low temperature, high weight — dominates the cheap iterations
    ModelConfig(name="Qwen/Qwen2.5-Coder-1.5B-Instruct", provider="openai",
                base_url="http://127.0.0.1:8000/v1",
                temperature=0.4, weight=1.0, role="fast"),
    # deep: bigger Gemma, higher temperature, lower weight — escalated by the router when fast stalls
    ModelConfig(name="google/gemma-2-9b-it", provider="openai",
                base_url="http://127.0.0.1:8001/v1",
                temperature=0.8, weight=0.3, role="deep"),
]
cfg.evaluator.cascade = [(correctness, 0.0)]

result = Controller(cfg, initial_program=INITIAL).run()
print(result.best.code)

You can mix any combination — a coding-specialized model as fast + a general reasoning model as deep, a small distilled model + a larger base model, etc. The router only cares which ModelConfig produces fitness improvements; it doesn't care about the architecture or vendor behind each base_url.

For multi-GPU runtimes (two A100s, etc.), pass tensor-parallel size to put a single large model across both GPUs:

start_vllm("Qwen/Qwen2.5-Coder-32B-Instruct", tensor_parallel_size=2)

`base_url` works with any OpenAI-compatible server (not just vLLM)

The base_url field on ModelConfig isn't vLLM-specific — it's how you point fastevolve at any server that speaks the OpenAI Chat Completions API. That means you can self-host with whichever engine you prefer and the library treats it as just another model in the ensemble:

Engine	Typical `base_url`	Notes
vLLM	`http://127.0.0.1:8000/v1`	Highest throughput on NVIDIA GPUs; what `start_vllm()` spawns.
LM Studio	`http://localhost:1234/v1`	GUI desktop app, easy on macOS/Windows.
llama.cpp server	`http://localhost:8080/v1`	CPU-friendly, GGUF quantization, runs anywhere.
TGI (Hugging Face Text Generation Inference)	`http://localhost:8080/v1`	Production-grade serving, similar throughput to vLLM.
SGLang	`http://localhost:30000/v1`	Optimized for structured generation and constrained decoding.
Any OpenAI-compatible cloud endpoint	their published URL	Use the provider's `api_key` via the `OPENAI_API_KEY` env var.

You start the server however that engine's docs say to, then drop a ModelConfig into the ensemble:

ModelConfig(name="my-served-model", provider="openai",
            base_url="http://localhost:1234/v1",
            temperature=0.5, weight=1.0, role="fast")

Mix and match freely — a llama.cpp CPU server for cheap iterations + a TGI GPU server for the deep model, for example. The adaptive router doesn't care which engine is behind each URL; it only cares which one produces fitness improvements.

Run the demo

Start Ollama and pull the model first:

ollama serve
ollama pull gemma4:e4b

Then:

uv run python main.py

Using OpenAI or Claude in the ensemble

Set the API key for whichever provider(s) you plan to use:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

On Windows (cmd.exe): set OPENAI_API_KEY=sk-...

Then pick a provider per model in your config. You can freely mix providers in one ensemble:

from fastevolve.llm_ensemble import ModelConfig

cfg.ensemble.models = [
    ModelConfig(name="gemma4:e4b", provider="ollama", temperature=0.6, weight=1.0, role="fast"),
    ModelConfig(name="gpt-4o-mini", provider="openai", temperature=0.6, weight=1.0, role="fast"),
    ModelConfig(name="claude-opus-4-7", provider="anthropic", temperature=0.7, weight=1.0,
                role="deep", options={"max_tokens": 4096}),
]

provider defaults to "ollama", so existing configs keep working unchanged.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.5

Jun 30, 2026

0.6.4

Jun 29, 2026

This version

0.6.3

Jun 29, 2026

0.6.2

Jun 29, 2026

0.6.1

Jun 29, 2026

0.6.0

Jun 29, 2026

0.5.2

Jun 27, 2026

0.5.1

Jun 27, 2026

0.5.0

Jun 26, 2026

0.4.0

Jun 26, 2026

0.3.4

Jun 25, 2026

0.3.3

Jun 25, 2026

0.3.2

Jun 25, 2026

0.3.1

Jun 25, 2026

0.3.0

Jun 25, 2026

0.2.1

Jun 25, 2026

0.2.0

Jun 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastevolve-0.6.3.tar.gz (290.8 kB view details)

Uploaded Jun 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fastevolve-0.6.3-py3-none-any.whl (24.7 kB view details)

Uploaded Jun 29, 2026 Python 3

File details

Details for the file fastevolve-0.6.3.tar.gz.

File metadata

Download URL: fastevolve-0.6.3.tar.gz
Upload date: Jun 29, 2026
Size: 290.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fastevolve-0.6.3.tar.gz
Algorithm	Hash digest
SHA256	`c695e47eb1713c92cb0fdb4114c215f7b18fa90486074672fc9785556a2e2aac`
MD5	`f10e6f04a0e54e4210df5ea5ae9643a1`
BLAKE2b-256	`03e6039b432d81d59d632713912ac2c49d21ca235824116c2b7397277811cd64`

See more details on using hashes here.

File details

Details for the file fastevolve-0.6.3-py3-none-any.whl.

File metadata

Download URL: fastevolve-0.6.3-py3-none-any.whl
Upload date: Jun 29, 2026
Size: 24.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fastevolve-0.6.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a94fb1a0b077478f2b23fb9197fc74f42fc22dcb3db51637778adcdf8eac8f91`
MD5	`a180a52f1bcd080451146a53381ec279`
BLAKE2b-256	`1e4a0ef90523f1ff65794c29e39204201934a5f64d93892f3276e771a0ffe58b`

See more details on using hashes here.

fastevolve 0.6.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fastevolve

Install

1. Install uv (one-time)

2. Add fastevolve to a new project

3. Or clone this repo and sync

Quick start in code

Local (with Ollama)

Google Colab (with OpenAI or Anthropic)

Google Colab (with Claude)

Google Colab (with Ollama)

Google Colab (with vLLM)

`base_url` works with any OpenAI-compatible server (not just vLLM)

Run the demo

Using OpenAI or Claude in the ensemble

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

fastevolve 0.6.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fastevolve

Install

1. Install uv (one-time)

2. Add fastevolve to a new project

3. Or clone this repo and sync

Quick start in code

Local (with Ollama)

Google Colab (with OpenAI or Anthropic)

Google Colab (with Claude)

Google Colab (with Ollama)

Google Colab (with vLLM)

base_url works with any OpenAI-compatible server (not just vLLM)

Run the demo

Using OpenAI or Claude in the ensemble

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`base_url` works with any OpenAI-compatible server (not just vLLM)