Production-grade mutation testing for Python.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

krish-arya

These details have not been verified by PyPI

Project description

Mutagen

LLM-assisted test generation, validated by mutation testing.

Mutagen ingests a Python repository, finds under-covered functions, generates pytest tests for them with an LLM, and keeps only the tests that actually kill mutants of the target. It is built on Clean Architecture with a strict dependency rule, two explicit state machines, full async I/O, SQLite-backed resume, and structured logging.

repo ──► ingest ──► select targets ──► generate tests ──► run ──► mutate ──► keep / discard ──► report
                                          ▲        │
                                          └── repair / strengthen loops ──┘

What it does

For each selected target the pipeline:

Generates a pytest module from the function's source, imports, and surrounding context — matching your project's existing test style.
Runs it in an isolated subprocess sandbox (timeout + resource limits, flakiness detection via a double-run).
Mutation-gates it with mutmut: if the tests don't kill enough mutants, the surviving mutants become feedback for a regeneration attempt (the strengthening loop). If the tests fail to run, the failure output drives a repair loop.
Keeps or discards the tests based on the mutation-score threshold, and persists the outcome immediately so an interrupted run resumes cleanly.

Context enrichment (optional)

Generation step 1 can fold in two extra signals — both off by default, both configured under [generation]:

Semantic code understanding (call graph). An AST-based CallGraphAnalyzer builds a repo-wide call graph and extracts each target's execution path — its transitive callees — so the model writes tests that exercise the whole tree end-to-end rather than just the entry function:
```
process_order
 ├── validate_order
 ├── calculate_tax
 └── save_order
```
The rendered tree and the callee sources are added to the prompt. The analyzer resolves only unambiguous in-repo calls (plain, self/cls methods, imported names) and omits anything it can't pin down — no misleading edges.
Retrieval-augmented generation (RAG). Instead of seeding the prompt with the first couple of test files, an EmbeddingTestRetriever indexes the project's existing tests (one chunk per test_* function) and retrieves the ones most similar to the target by embedding similarity:
```
target function ─► vector search ─► relevant existing tests ─► prompt
```
The default HashingEmbeddingProvider is dependency-free and deterministic (no model download, no API key); a real embedding model can drop in behind the same port. Retrieved examples make generated tests far more consistent with the conventions of genuinely related code.

Architecture

Mutagen follows a strict dependency rule: dependencies point inward, toward the domain. The domain (core) knows nothing about infrastructure; the composition root (config/container.py) is the only place concrete adapters are imported.

┌─────────────────────────────────────────────────────────────────────┐
│  cli/            mutagen run <repo> · Rich progress UI · dashboard    │
├─────────────────────────────────────────────────────────────────────┤
│  services/       orchestrator · target_processor · budget · reporting │
│                  (depend only on core.interfaces — the ports)         │
├─────────────────────────────────────────────────────────────────────┤
│  core/           models (frozen dataclasses) · interfaces (ports)     │
│                  exceptions · state_machine (run + target FSMs)       │
├─────────────────────────────────────────────────────────────────────┤
│  infrastructure/ ingest · selection · generation · llm · sandbox      │
│  reporting/      gate · store (SQLite) · md/json/terminal reporters   │
│                  (implement the ports; only layer that does real I/O) │
└─────────────────────────────────────────────────────────────────────┘
        ▲                                                       │
        └──────────  config/container.py wires it all  ─────────┘

Ports → adapters

Every infrastructure concern is an abstract port in core/interfaces/, with a concrete adapter in infrastructure/:

Port	Adapter	Role
`RepoIngestor`	`ingest/FilesystemRepoIngestor`	Clone/copy repo → isolated workspace, venv, deps
`TargetSelector`	`selection/AstTargetSelector`	Coverage-guided, AST-based target ranking
`CallGraphAnalyzer`	`selection/AstCallGraphAnalyzer`	Build a repo call graph → a target's execution path
`TestGenerator`	`generation/LLMTestGenerator`	Gather context → prompt → validate generated tests
`EmbeddingProvider`	`retrieval/HashingEmbeddingProvider`	Embed text into vectors (dependency-free default)
`TestRetriever`	`retrieval/EmbeddingTestRetriever`	Index existing tests → retrieve the most similar ones
`LLMClient`	`llm/AnthropicLLMClient`	Anthropic API (retries, backoff, cost tracking)
`SandboxRunner`	`sandbox/SubprocessSandboxRunner`	Run pytest isolated (timeout, rlimits, flakiness)
`MutationGate`	`gate/MutmutMutationGate`	Drive mutmut, score, survivor feedback, keep/discard
`Store`	`store/SqliteStore`	Persist final runs + artifacts
`CheckpointStore`	`store/SqliteCheckpointStore`	Per-target progress for resume
`Reporter`	`reporting/{Markdown,Json,Terminal,Composite}`	`report.md` + `report.json` + dashboard

Two state machines

RUN lifecycle (RunStateMachine)
  PENDING → INITIALIZING → INGESTING → SELECTING_TARGETS
          → GENERATING_TESTS → GATING → REPORTING → COMPLETED
          (any active state → FAILED / CANCELLED)

TARGET lifecycle (TargetStateMachine), one per target
  SELECTED → GENERATED → RAN → MUTATED → KEPT
          (any active state → DISCARDED)

Both are data-driven tables that reject illegal transitions rather than silently proceeding.

Orchestration loop

for each selected target (skipping ones already done on a prior run):
    if budget/cost exhausted: stop cleanly → PARTIAL result (resumable)
    ┌─ TargetProcessor ───────────────────────────────────────────┐
    │  generate ──► run ──► (repair loop on failure)               │
    │           └─► gate ──► (strengthen loop on surviving mutants)│
    │           └─► KEPT (score ≥ threshold) or DISCARDED          │
    └─────────────────────────────────────────────────────────────┘
    persist the target's checkpoint IMMEDIATELY  (resume-safe)
finalize RunResult → summarize → write report.md + report.json → save run

Project layout

mutagen/
├── cli/              # argparse CLI + Rich progress UI
├── config/           # RunConfig, TOML loader, logging, DI container
├── core/
│   ├── models/           # frozen domain dataclasses (RunResult, TargetOutcome, …)
│   ├── interfaces/       # abstract ports (ABCs)
│   ├── exceptions/       # MutagenError hierarchy
│   └── state_machine/    # run + target FSMs
├── services/         # orchestrator, target_processor, budget, reporting, progress
├── infrastructure/
│   ├── ingest/ selection/ generation/ llm/ sandbox/ gate/ store/
│   └── process.py        # shared subprocess-safety helper
├── reporting/        # markdown / json / terminal / composite reporters
├── tests/            # 238 tests (unit + integration, mock-driven)
└── main.py           # entrypoint

Setup

Requires Python 3.11+ and git (for ingesting remote repositories).

# Install with every integration (Anthropic + OpenAI SDKs, coverage, mutmut, …)
pip install "mutagen[all]"        # or: pipx install "mutagen[all]"

Then provide an API key for whichever provider you use — via your shell or a .env file in your project (loaded automatically):

export ANTHROPIC_API_KEY=sk-ant-...    # Anthropic   (Windows: $env:ANTHROPIC_API_KEY=...)
export OPENAI_API_KEY=sk-...           # OpenAI
export GEMINI_API_KEY=...              # Google Gemini
export OPENROUTER_API_KEY=sk-or-...    # OpenRouter

# .env (kept out of source control; never committed)
OPENAI_API_KEY=sk-...

Verify everything at once:

mutagen doctor    # checks Python, git, optional deps, and which provider key is set

Lighter installs are available via extras: pip install mutagen (CLI + reporting only), then add [llm] (Anthropic), [openai] (OpenAI / Gemini / OpenRouter), [sandbox], [mutation], or [coverage] as needed. mutagen doctor tells you exactly which extra to install for anything missing.

Usage

# Run against a local path or a git URL
mutagen run ./path/to/project
mutagen run https://github.com/org/repo

# With a config file and a score threshold
mutagen -c mutagen.toml run ./project --threshold 0.8

# Resume an interrupted run (reuse its id)
mutagen run ./project --run-id my-run-123

# Re-render the most recent run's report
mutagen report

# Diagnose the environment (Python, git, optional deps, provider key)
mutagen doctor

mutagen run exits 0 on success, 1 on a handled failure, and 2 when the achieved mutation score is below the configured threshold (useful as a CI gate).

Live progress & dashboard

On a TTY the CLI shows a Rich progress bar and a per-phase status line, then a summary table. In CI / piped output it falls back to plain line logging automatically. Use --no-progress to force plain output.

                 Mutagen Run a1b2c3 [succeeded]
┌──────────────────────────────────┬──────────────┐
│ Mutation score (before -> after) │  n/a -> 84%  │
│ Targets kept / discarded         │       12 / 3 │
│ Tests generated                  │           15 │
│ API cost                         │      $0.4210 │
│ Execution time                   │       182.4s │
└──────────────────────────────────┴──────────────┘

Reports

Every run writes two files under <storage.root>/reports/ (default .mutagen/reports/):

report.md — human-readable dashboard: mutation score before/after, kept vs. discarded targets, API cost, execution time, and a per-target table.
report.json — the same data, machine-readable for CI and archival.

Both include: mutation score before/after, kept / discarded tests, API cost (USD + tokens + requests), execution time, and per-target statistics.

Note on "before": the after score is always measured. The before (baseline) score — what the repo's pre-existing tests already kill — is wired through the model and rendered as n/a until a baseline gate pass is enabled; it is best-effort by design.

Configuration

Configuration is TOML, mirroring the config dataclass tree. See mutagen.example.toml for the fully-annotated template. CLI flags (--threshold) override file values. Highlights:

project_root = "."
score_threshold = 0.8

[llm]
model = "claude-opus-4-8"
effort = "high"

[orchestrator]       # budget & cost ceilings (0 = unlimited)
max_targets = 50
max_cost_usd = 5.0
max_repair_attempts = 2
max_strengthen_attempts = 2
max_parallel_targets = 4   # process this many targets at once (1 = sequential)

[storage]
backend = "sqlite"
root = ".mutagen"

Hitting any budget/cost limit stops the run cleanly with a PARTIAL, resumable result — the in-flight target finishes and everything completed is already persisted.

Parallelism

Targets are independent — each runs in its own isolated sandbox and mutation workspace — so the orchestrator processes up to max_parallel_targets of them at once via a bounded worker pool (default 1, i.e. sequential). Budget and cost limits are enforced with an atomic reservation before each target is scheduled, so concurrency never overshoots max_targets; once a limit trips, no new targets start but those already in flight finish cleanly. Per-target checkpoints are still written immediately, so resume works identically whether the run was sequential or parallel.

Because the dominant cost is CPU-bound (pytest + mutmut), the practical sweet spot for max_parallel_targets is roughly the host's core count — higher values mostly cause subprocess thrashing rather than further speedup.

Persistence & resume

State lives in a single SQLite database at <storage.root>/mutagen.db:

runs — final RunResult records (JSON payload).
run_checkpoints / target_checkpoints — per-target progress, upserted the moment each target finishes.

Re-running with the same --run-id loads the checkpoint, skips targets already in a terminal state, and carries their outcomes forward.

Docker

docker build -t mutagen .
docker run --rm \
  -e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \
  -v "$PWD:/workspace" \
  mutagen run /workspace

The image is a slim multi-stage build with git for cloning targets, runs as a non-root user, and uses /workspace as the working directory.

Development

pip install -e ".[dev,sandbox]"

pytest                 # 238 tests, mock-driven (no network, no real LLM)
ruff check mutagen     # lint
ruff format mutagen    # format
mypy mutagen           # type-check (strict; aspirational)

CI (.github/workflows/ci.yml) runs the suite on Python 3.11 & 3.12, lints/formats with ruff, type-checks with mypy, and builds the Docker image on every push and PR.

Testing philosophy

The whole suite runs without a network or a real LLM: ports are mocked, subprocess calls are faked, and the few genuine integration tests (the sandbox runner) drive real pytest against tiny fixtures and skip cleanly when their optional tools are absent.

License

MIT.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

krish-arya

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mutagen_ai-0.1.0.tar.gz (149.6 kB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mutagen_ai-0.1.0-py3-none-any.whl (209.6 kB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file mutagen_ai-0.1.0.tar.gz.

File metadata

Download URL: mutagen_ai-0.1.0.tar.gz
Upload date: Jun 7, 2026
Size: 149.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mutagen_ai-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6d6d4b95cb61043d318f953144e558a278598221757e353c9cbc0e08f60aa5e1`
MD5	`526d5622ab6c66a2d15955d428ca8b1b`
BLAKE2b-256	`3707fe8255ae249ccb7e5b5d3ef97719a79d859a0cdb01122667cb2a9c1549fe`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mutagen_ai-0.1.0.tar.gz:

Publisher: publish.yml on krish-arya/mutagen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mutagen_ai-0.1.0.tar.gz
- Subject digest: 6d6d4b95cb61043d318f953144e558a278598221757e353c9cbc0e08f60aa5e1
- Sigstore transparency entry: 1752185979
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: krish-arya/mutagen@d8633be7a93e1f9fccfcd0ccd38cd1455ca1d924
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/krish-arya
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d8633be7a93e1f9fccfcd0ccd38cd1455ca1d924
- Trigger Event: push

File details

Details for the file mutagen_ai-0.1.0-py3-none-any.whl.

File metadata

Download URL: mutagen_ai-0.1.0-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 209.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mutagen_ai-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f7dafbfa3e65b45024b9ccbe71c26212cfafc1cafa89bbc4993ac3d1f06e6cff`
MD5	`b8dc9b87ccac4f727ab81c28af367182`
BLAKE2b-256	`6b6e3ce13548f273b9c362db6de74f00a92b44c2bcd2e67d03188bfe0e243039`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mutagen_ai-0.1.0-py3-none-any.whl:

Publisher: publish.yml on krish-arya/mutagen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mutagen_ai-0.1.0-py3-none-any.whl
- Subject digest: f7dafbfa3e65b45024b9ccbe71c26212cfafc1cafa89bbc4993ac3d1f06e6cff
- Sigstore transparency entry: 1752186035
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: krish-arya/mutagen@d8633be7a93e1f9fccfcd0ccd38cd1455ca1d924
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/krish-arya
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d8633be7a93e1f9fccfcd0ccd38cd1455ca1d924
- Trigger Event: push

mutagen-ai 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

Mutagen

What it does

Context enrichment (optional)

Architecture

Ports → adapters

Two state machines

Orchestration loop

Project layout

Setup

Usage

Live progress & dashboard

Reports

Configuration

Parallelism

Persistence & resume

Docker

Development

Testing philosophy

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance