Skip to main content

Mutation testing for LLM prompts. Find the gaps in your eval suite before production does.

Project description

llm-mutation

Mutation testing for LLM prompts. Find the gaps in your eval suite before production does.

pip install llm-mutation
mutate run --prompt prompts/customer_service.txt --eval evals/test_cs.py

The Problem

You have an eval suite. It passes. You ship. Production breaks.

Your eval suite tested 50 specific cases you wrote. It was never tested itself. llm-mutation tests whether your eval suite would notice if a key constraint was removed, a clause was dropped, or a scope was expanded.

Quickstart

from llm_mutation import MutationEngine, MutantRunner, MutationReport

# 1. Generate semantic mutations of your prompt
engine = MutationEngine()
mutations = engine.generate("prompts/customer_service.txt")

# 2. Run your eval suite against each mutant
def my_eval_fn(prompt: str, test_cases: list) -> float:
    # your existing eval logic — returns 0.0-1.0
    ...

runner = MutantRunner(eval_fn=my_eval_fn, test_cases=my_test_cases)
results = runner.run(mutations)

# 3. See your gaps
report = MutationReport.from_results(results, prompt, original_score=0.91)
print(report.summary())
# MUTATION SCORE: 71% (5/7 mutations killed)
# SURVIVING MUTATIONS:
#   ✗ DropClause — "Direct pricing questions to sales@acmecorp.com." removed
#     → ADD TEST CASE: "User asks 'What does the enterprise plan cost?'"

Six Deterministic Mutation Operators

Operator What it does
NegateConstraint Removes a prohibitive clause ("Never X")
DropClause Removes a requirement ("Always X", "You must X")
ScopeExpand Widens a scope restriction ("software only" → "products and services")
ScopeNarrow Narrows a permission ("any topic" → "general topics only")
ConditionInvert Removes a conditional behavior ("if A, then B")
PhraseSwap Swaps a style phrase ("concise" ↔ "comprehensive")

No LLM required for mutation generation — all operators are deterministic text transforms.

Mutation Score

Score Verdict Meaning
>= 90% STRONG Eval suite is comprehensive
80-89% ADEQUATE Good for CI gate
70-79% MARGINAL Meaningful gaps
60-69% WEAK Significant gaps
< 60% DANGEROUS Not fit for purpose

Recommended minimum for production CI gate: 80%

CLI

# Run mutation test
mutate run --prompt prompts/cs.txt --eval evals/test_cs.py --output report.json

# Generate report
mutate report --input report.json --format markdown

# CI gate (exit 1 if score < 80%)
mutate ci --input report.json --min-score 0.80

# Calibrate your eval suite
mutate calibrate --prompt prompts/cs.txt --eval evals/test_cs.py

GitHub Action

- run: pip install llm-mutation
- name: Run mutation tests
  run: |
    mutate run --prompt prompts/cs.txt --eval evals/test_cs.py --output report.json
    mutate ci --input report.json --min-score 0.80

Pattern Foundation

Built on PAT-045 — Judges 6:36-40 (The Gideon Fleece Inversion Pattern).

Gideon designed a two-condition invertible test: fleece wet/ground dry, then fleece dry/ground wet. He wasn't testing God's power — he was testing whether his testing mechanism could discriminate signal from coincidence.

llm-mutation is the bowlful of water. Your mutation score is your measurement.

Supporting: PAT-046 (Acts 17:11 — Berean Null Test) → mutate calibrate Supporting: PAT-047 (Numbers 13:25-33 — Twelve Spies Divergence) → mutate verify-judge

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_mutation-0.1.0.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_mutation-0.1.0-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file llm_mutation-0.1.0.tar.gz.

File metadata

  • Download URL: llm_mutation-0.1.0.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for llm_mutation-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ade1908c87e7516b9f447cc50ac585443f7b55ac62c89cac60279cb22f60d4cf
MD5 ae44a42c82db76b42576af11ec48061d
BLAKE2b-256 578228e0d7a24800ff205d0f2f636e34feb4473541ca29a7906f48a8cafea2f3

See more details on using hashes here.

File details

Details for the file llm_mutation-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llm_mutation-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for llm_mutation-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8980f551cf42e003786c0f7a6f2d50f36c6ac49b2295551d13171762affb9002
MD5 4b7a5969c7139146b2f7eadb01be7a82
BLAKE2b-256 8080792fa5acdf6889d59602e95f44492cca0bbbc29d81936b2c861b1f50f1fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page