Mutation testing for LLM prompts. Find the gaps in your eval suite before production does.
Project description
llm-mutation
Mutation testing for LLM prompts. Find the gaps in your eval suite before production does.
pip install llm-mutation
mutate run --prompt prompts/customer_service.txt --eval evals/test_cs.py
The Problem
You have an eval suite. It passes. You ship. Production breaks.
Your eval suite tested 50 specific cases you wrote. It was never tested itself. llm-mutation tests whether your eval suite would notice if a key constraint was removed, a clause was dropped, or a scope was expanded.
Quickstart
from llm_mutation import MutationEngine, MutantRunner, MutationReport
# 1. Generate semantic mutations of your prompt
engine = MutationEngine()
mutations = engine.generate("prompts/customer_service.txt")
# 2. Run your eval suite against each mutant
def my_eval_fn(prompt: str, test_cases: list) -> float:
# your existing eval logic — returns 0.0-1.0
...
runner = MutantRunner(eval_fn=my_eval_fn, test_cases=my_test_cases)
results = runner.run(mutations)
# 3. See your gaps
report = MutationReport.from_results(results, prompt, original_score=0.91)
print(report.summary())
# MUTATION SCORE: 71% (5/7 mutations killed)
# SURVIVING MUTATIONS:
# ✗ DropClause — "Direct pricing questions to sales@acmecorp.com." removed
# → ADD TEST CASE: "User asks 'What does the enterprise plan cost?'"
Six Deterministic Mutation Operators
| Operator | What it does |
|---|---|
NegateConstraint |
Removes a prohibitive clause ("Never X") |
DropClause |
Removes a requirement ("Always X", "You must X") |
ScopeExpand |
Widens a scope restriction ("software only" → "products and services") |
ScopeNarrow |
Narrows a permission ("any topic" → "general topics only") |
ConditionInvert |
Removes a conditional behavior ("if A, then B") |
PhraseSwap |
Swaps a style phrase ("concise" ↔ "comprehensive") |
No LLM required for mutation generation — all operators are deterministic text transforms.
Mutation Score
| Score | Verdict | Meaning |
|---|---|---|
| >= 90% | STRONG | Eval suite is comprehensive |
| 80-89% | ADEQUATE | Good for CI gate |
| 70-79% | MARGINAL | Meaningful gaps |
| 60-69% | WEAK | Significant gaps |
| < 60% | DANGEROUS | Not fit for purpose |
Recommended minimum for production CI gate: 80%
CLI
# Run mutation test
mutate run --prompt prompts/cs.txt --eval evals/test_cs.py --output report.json
# Generate report
mutate report --input report.json --format markdown
# CI gate (exit 1 if score < 80%)
mutate ci --input report.json --min-score 0.80
# Calibrate your eval suite
mutate calibrate --prompt prompts/cs.txt --eval evals/test_cs.py
GitHub Action
- run: pip install llm-mutation
- name: Run mutation tests
run: |
mutate run --prompt prompts/cs.txt --eval evals/test_cs.py --output report.json
mutate ci --input report.json --min-score 0.80
Pattern Foundation
Built on PAT-045 — Judges 6:36-40 (The Gideon Fleece Inversion Pattern).
Gideon designed a two-condition invertible test: fleece wet/ground dry, then fleece dry/ground wet. He wasn't testing God's power — he was testing whether his testing mechanism could discriminate signal from coincidence.
llm-mutation is the bowlful of water. Your mutation score is your measurement.
Supporting: PAT-046 (Acts 17:11 — Berean Null Test) → mutate calibrate
Supporting: PAT-047 (Numbers 13:25-33 — Twelve Spies Divergence) → mutate verify-judge
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_mutation-0.1.0.tar.gz.
File metadata
- Download URL: llm_mutation-0.1.0.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ade1908c87e7516b9f447cc50ac585443f7b55ac62c89cac60279cb22f60d4cf
|
|
| MD5 |
ae44a42c82db76b42576af11ec48061d
|
|
| BLAKE2b-256 |
578228e0d7a24800ff205d0f2f636e34feb4473541ca29a7906f48a8cafea2f3
|
File details
Details for the file llm_mutation-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_mutation-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8980f551cf42e003786c0f7a6f2d50f36c6ac49b2295551d13171762affb9002
|
|
| MD5 |
4b7a5969c7139146b2f7eadb01be7a82
|
|
| BLAKE2b-256 |
8080792fa5acdf6889d59602e95f44492cca0bbbc29d81936b2c861b1f50f1fc
|