Skip to main content

Domain-agnostic autonomous optimization framework

Project description

anneal

CI PyPI Python License

Let an AI agent improve your code, prompts, and configs — overnight, unattended.

anneal infographic — autonomous optimization for any measurable artifact

Point anneal at any text file in a git repo, tell it how to measure "better," and walk away. The agent generates hypotheses, runs experiments, keeps winners, discards losers, and compounds learnings — all while you sleep.

Quick Start

uv tool install anneal-cli

Requires Python 3.12+.

# Register a target
anneal register \
  --name my-target \
  --artifact path/to/file.py \
  --eval-mode deterministic \
  --run-cmd "python benchmark.py" \
  --parse-cmd "grep 'score' | awk '{print \$2}'" \
  --direction maximize \
  --scope scope.yaml

# Run experiments
anneal run --target my-target --experiments 20

# Review results — every experiment is a git commit
git log --oneline

Supported Providers

Provider Models Notes
Anthropic claude-* Default. Claude Code or API
OpenAI gpt-* Via OpenAI SDK
Google gemini-* Via OpenAI-compatible endpoint
Ollama ollama/* Local. $0 cost tracking
LM Studio lmstudio/* Local. $0 cost tracking
Any OpenAI-compatible Custom Via base URL override

Two Evaluation Modes

Deterministic — shell command produces a number

A shell command produces a numeric score. Run code, parse output, compare. Use for: performance benchmarks, test coverage, file size, build time.

--eval-mode deterministic \
--run-cmd "pytest --cov=src --cov-report=term | grep TOTAL | awk '{print \$4}'" \
--parse-cmd "cat"
Stochastic — LLM judges N samples against K binary criteria

An LLM judges N samples against K binary (YES/NO) criteria. Use for: prompt quality, documentation clarity, content optimization — anything where output varies between runs.

--eval-mode stochastic \
--criteria eval_criteria.toml

Each criterion is a YES/NO question. Scores aggregate across samples and criteria into a single float.

Where Anneal Works

Use Case Eval Mode ~Cost / 50 exp
Prompt optimization stochastic $8–$13
API response time deterministic $2–$5
Test coverage improvement deterministic $2–$5
Training config (hyperparams) deterministic $2–$8
RAG retrieval prompt deterministic $2–$5
System prompt stochastic $8–$15
Config tuning (build/infra) deterministic $1–$3

Where Anneal Does Not Work

Target Reason
Binary files, databases Artifact must be a text file in git
Embedding model selection Requires full re-index — not a file edit
Inter-agent protocol changes Coordinated multi-file edits required
Live system tuning No git isolation, unsafe to mutate in place
Cross-service optimization Single-artifact scope only
Database schema migrations Irreversible side effects

Results

Code Golf — 93.7% size reduction

Shrink a verbose Python file while preserving byte-identical output.

Metric Value
Target examples/code-golf/app.py
Eval mode Deterministic (file size in bytes)
Direction Minimize
Experiments 7
Start score 3,592 bytes
End score 228 bytes
Reduction 93.7%
Score trajectory: 3,592 bytes to 228 bytes over 7 experiments

Prompt Optimization — stochastic eval

Improve an article summarizer prompt against 4 binary criteria across 5 test articles. Scores improve across 10 experiments as the agent iteratively refines the system prompt.

Examples

Prompt Optimization — stochastic eval

The agent rewrites system_prompt.md, generates summaries from 5 test articles, and an LLM judge scores each against 4 binary criteria (key points captured? concise? plain language? factually accurate?).

anneal register \
  --name prompt-optimizer \
  --artifact examples/prompt-optimizer/system_prompt.md \
  --eval-mode stochastic \
  --criteria examples/prompt-optimizer/eval_criteria.toml \
  --direction maximize \
  --scope examples/prompt-optimizer/scope.yaml

anneal run --target prompt-optimizer --experiments 10

Test Coverage — deterministic eval, maximize

The agent adds tests to cover untested code paths. pytest --cov provides the score. Source code is immutable — the agent can only write tests.

anneal register \
  --name test-coverage \
  --artifact examples/test-coverage/tests/test_calculator.py \
  --eval-mode deterministic \
  --run-cmd "bash examples/test-coverage/eval.sh" \
  --parse-cmd "cat" \
  --direction maximize \
  --scope examples/test-coverage/scope.yaml

anneal run --target test-coverage --experiments 10

Code Golf — deterministic eval, minimize

anneal register \
  --name code-golf \
  --artifact examples/code-golf/app.py \
  --eval-mode deterministic \
  --run-cmd "bash examples/code-golf/eval.sh" \
  --parse-cmd "cat" \
  --direction minimize \
  --scope examples/code-golf/scope.yaml

anneal run --target code-golf --experiments 10

Local Artifacts (no git tracking required)

Artifact files don't need to be committed to git. If they're untracked, anneal copies them into the worktree automatically during registration. For files you don't want in version control at all, use --in-place to skip worktree isolation entirely:

anneal register \
  --name local-skill \
  --artifact SKILL.md \
  --eval-mode stochastic \
  --criteria eval_criteria.toml \
  --direction maximize \
  --scope scope.yaml \
  --in-place

Documentation

Doc What's in it
Overview Motivation, lineage, and the core idea
Eval Guide Writing good binary evaluation criteria
Recipes Copy-paste registration commands for common targets
Use Cases Where anneal works, where it doesn't, and why
Features Search strategies, statistical methods, knowledge system
Architecture Module map and design principles
System Design Full technical design document
CI Integration GitHub Actions workflow and status JSON output

Testing

uv run pytest tests/ -x -q          # 820 tests
uv run pytest tests/ --cov=anneal    # With coverage

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anneal_cli-0.4.0.tar.gz (234.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anneal_cli-0.4.0-py3-none-any.whl (146.5 kB view details)

Uploaded Python 3

File details

Details for the file anneal_cli-0.4.0.tar.gz.

File metadata

  • Download URL: anneal_cli-0.4.0.tar.gz
  • Upload date:
  • Size: 234.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for anneal_cli-0.4.0.tar.gz
Algorithm Hash digest
SHA256 4eee69aa722fe3ac483fd8fd3aee4e79426adfdbc8f68e4daba9567df497785e
MD5 1dbcda2fff5be1ef7eb4c5f315b40641
BLAKE2b-256 87b1f6bf745297c6d21c32c74f4e5bae6ce66af6a9545427dabd66e943d95e2d

See more details on using hashes here.

File details

Details for the file anneal_cli-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: anneal_cli-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 146.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for anneal_cli-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7736e438fb11411f1e47a2115202d13905fd3d8a2bb9bba3cc284fcae12af195
MD5 b00156f82d2490bf65ad8c98154b9890
BLAKE2b-256 7ce4dabde59ea4d2b8d86d37998979057dd361c1b884566e9109b046c4c17b75

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page