Skip to main content

Mutation testing orchestration MCP server with belief revision

Project description

Curate-Ipsum

A graph-spectral MCP server for verified code synthesis through belief revision

PyPI MCP Registry Claude Desktop License: MIT Python

Top Language Code Size Last Commit Repo Size

Curate-Ipsum bridges the gap between LLM-generated code (fast, plausible, unverified) and formally verified patches (slow, correct, trustworthy). It treats mutation testing as one component of a larger system for maintaining robust, self-healing codebase metadata that supports reachability analysis, symbolic execution, and automated test generation.

Install

# PyPI
pip install curate-ipsum

# or with uv
uv pip install curate-ipsum

# Docker (includes baked-in embedding model)
docker pull ghcr.io/egoughnour/curate-ipsum:latest

Claude Desktop / MCP Client

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "curate-ipsum": {
      "command": "uvx",
      "args": ["curate-ipsum"]
    }
  }
}

Or with Docker (embedding model pre-loaded, no Python needed):

{
  "mcpServers": {
    "curate-ipsum": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "ghcr.io/egoughnour/curate-ipsum:latest"]
    }
  }
}

MCP Tools

Curate-Ipsum exposes 30 tools over the MCP stdio transport, organised into six groups:

Testingrun_unit_tests, run_integration_tests, run_mutation_tests, get_run_history, get_region_metrics, detect_frameworks, parse_region, check_region_relationship, create_region

Belief Revisionadd_assertion, contract_assertion, revise_theory, get_entrenchment, list_assertions, get_theory_snapshot, store_evidence, get_provenance, why_believe, belief_stability

Rollback & Failurerollback_to, undo_last_operations, analyze_failure, list_world_history

Graph-Spectralextract_call_graph, compute_partitioning, query_reachability, get_hierarchy, find_function_partition, incremental_update, persistent_graph_stats, graph_query

Verificationverify_property (Z3/angr), verify_with_orchestrator (CEGAR budget escalation), list_verification_backends

Synthesis & RAGsynthesize_patch (CEGIS + genetic + LLM), synthesis_status, cancel_synthesis, list_synthesis_runs, rag_index_nodes, rag_search, rag_stats

Current Status

Last Updated: 2026-02-08

Component Status
Multi-framework parsing (5 frameworks) Complete
Graph Infrastructure (Spectral/Kameda) Complete
Belief Revision Engine (AGM/Provenance) Complete
Synthesis Loop (CEGIS/Genetic) Complete
Verification Backends (Z3/angr) Complete
Graph Persistence (SQLite/Kuzu) Complete
RAG / Semantic Search (Chroma) Complete

The Problem

LLMs produce code that is:

  • ✅ Syntactically valid (usually)
  • ✅ Statistically plausible
  • ❌ Semantically correct (sometimes)
  • ❌ Type-safe (by accident)
  • ❌ Formally verified (never)

Current approaches either trust LLM output blindly or reject it entirely. Neither is optimal.

The Solution

Use LLMs for cheap candidate generation, then invest computational resources to achieve formal guarantees:

LLM Candidates (k samples)
        ↓
   Seed Population
        ↓
┌───────────────────────────┐
│  CEGIS + CEGAR + Genetic  │  ← Verification loop
│  + Belief Revision        │
└───────────────────────────┘
        ↓
  Strongly Typed Patch
  (with proof certificate)

Key Differentiators from State of the Art

vs. Traditional Mutation Testing (Stryker, mutmut, cosmic-ray)

Traditional Curate-Ipsum
Single tool, single language Multi-framework orchestration
Flat file-level analysis Hierarchical graph-spectral decomposition
Mutation score as output Mutation testing as input to synthesis
No formal verification CEGIS/CEGAR verification loop
Manual test writing Automated patch generation

vs. LLM Code Generation (Copilot, Claude, GPT)

LLM-only Curate-Ipsum
Trust model output Verify model output
Single sample or best-of-k Population-based refinement
No formal guarantees Proof certificates
Stateless generation Belief revision with provenance
Plausible code Provably correct code

vs. Program Synthesis (Sketch, Rosette, SyGuS)

Traditional Synthesis Curate-Ipsum
Hand-written sketches LLM-generated candidates
Cold-start search Warm-start from LLM population
No learning across runs Totalizing theory accumulates knowledge
Single specification Multi-framework implicit regions

vs. Symbolic Execution (KLEE, S2E)

Symbolic Execution Curate-Ipsum
Path exploration only Integrated with synthesis
Boolean constraint solving Mathematical reformulation (SymPy)
Single-tool analysis Graph DB + SMT + mutation orchestration
No code generation Generates verified patches

Novel Contributions

  1. Graph-Spectral Code Decomposition

    • Fiedler vector partitioning for optimal reachability
    • Hierarchical SCC condensation
    • Planar subgraph identification → O(1) Kameda queries
    • Kuratowski subgraphs as atomic non-planar units
  2. Belief Revision for Synthesis

    • AGM-compliant theory revision
    • Entrenchment ordering for minimal contraction
    • Provenance DAG for failure mode analysis
    • Rollback sharpens validity (failures refine the universal model)
  3. Implicit Region Detection

    • Spectral anomalies reveal undertested code
    • Cross-framework mutation resistance identifies critical regions
    • Historical mutability guides partition optimization
  4. Mathematical Constraint Reformulation

    • Boolean-intractable → differential/root-finding
    • SymPy path condition encoding
    • Hybrid SMT + numerical solving

Architecture

flowchart TB
    subgraph MCP["MCP Interface"]
        direction TB

        subgraph Sources["Analysis Sources"]
            direction LR
            MUT["🧬 Mutation<br/>Orchestrator<br/><small>Stryker<br/>mutmut<br/>cosmic-ray</small>"]
            SYM["🔬 Symbolic<br/>Execution<br/><small>KLEE · Z3<br/>SymPy</small>"]
            GRAPH["📊 Graph<br/>Analysis<br/><small>Joern<br/>Neo4j<br/>Fiedler</small>"]
        end

        MUT --> BRE
        SYM --> BRE
        GRAPH --> BRE

        BRE["🧠 Belief Revision Engine<br/><small>AGM Theory · Entrenchment · Provenance DAG</small>"]

        BRE --> SYNTH

        SYNTH["⚙️ Synthesis Loop<br/><small>CEGIS · CEGAR · Genetic Algorithm</small>"]

        SYNTH --> |"counterexample"| BRE

        SYNTH --> OUTPUT

        OUTPUT["✅ Strongly Typed Patch<br/><small>Proof Certificate ·Type Signature<br/>Pre/Post Conditions</small>"]
    end

    LLM["🤖 LLM Candidates<br/><small>top-k samples</small>"] --> SYNTH

    style MCP fill:#1a1a2e,stroke:#16213e,color:#eee
    style Sources fill:#16213e,stroke:#0f3460,color:#eee
    style MUT fill:#0f3460,stroke:#e94560,color:#eee
    style SYM fill:#0f3460,stroke:#e94560,color:#eee
    style GRAPH fill:#0f3460,stroke:#e94560,color:#eee
    style BRE fill:#533483,stroke:#e94560,color:#eee
    style SYNTH fill:#e94560,stroke:#ff6b6b,color:#fff
    style OUTPUT fill:#06d6a0,stroke:#118ab2,color:#000
    style LLM fill:#ffd166,stroke:#ef476f,color:#000

Roadmap

Phase 1: Foundation ✅

  • MCP server infrastructure
  • Stryker report parsing
  • Run history and PID metrics
  • Flexible region model (hierarchical: file → class → function → lines)
  • mutmut parser integration
  • Framework auto-detection
  • Unified parser interface

Phase 2: Graph Infrastructure ✅

  • Graph models (CodeGraph, Node, Edge)
  • Call graph extraction (AST-based)
  • ASR extractor (import/class analysis)
  • Dependency graph extraction (module-level imports)
  • Laplacian construction from call/dependency graphs
  • Fiedler vector computation (scipy.sparse.linalg)
  • Recursive Fiedler partitioning with virtual sink/source
  • SCC detection and hierarchical condensation
  • Planar subgraph identification (Boyer-Myrvold)
  • Kameda preprocessing for O(1) reachability
  • MCP tools (extract, partition, reachability, hierarchy, find)

Phase 3: Multi-Framework Orchestration ✅

  • Unified mutation framework interface
  • cosmic-ray parser
  • poodle parser
  • universalmutator parser

Phase 4: Belief Revision Engine ✅

  • py-brs library integration (AGM core)
  • Evidence adapter (mutation results → beliefs)
  • Theory manager for curate-ipsum
  • AGM contraction (py-brs v2.0.0 released)
  • Entrenchment calculation (py-brs v2.0.0)
  • Provenance DAG storage and queries
  • Failure mode analyzer
  • Rollback mechanism

Phase 5: Synthesis Loop ✅

  • CEGIS implementation with LLM seeding
  • Genetic algorithm with AST-aware crossover
  • Entropy monitoring and diversity injection
  • Counterexample-directed mutation
  • CEGAR budget escalation (10s → 30s → 120s)

Phase 6: Verification Backends ✅

  • Z3 integration for SMT solving (default backend)
  • angr Docker symbolic execution (expensive tier)
  • CEGAR orchestrator with budget escalation
  • Verification harness builder (C source generation)
  • Mock backend for testing
  • Alternative solvers (CVC5, Boolector)
  • SymPy path condition encoding

Phase 7: Graph Persistence ✅

  • Abstract GraphStore ABC
  • SQLite graph store (primary)
  • Kuzu graph store (optional)
  • Synthesis result persistence
  • Kameda & Fiedler persistence
  • Incremental update engine

Phase 8: RAG / Semantic Search ✅

  • ChromaDB vector store integration
  • sentence-transformers embedding provider (all-MiniLM-L6-v2)
  • Graph-expanded RAG pipeline (vector top-k → neighbor expansion → rerank)
  • Decay scoring for temporal relevance
  • CEGIS integration for context-aware synthesis

Phase 9: Production Hardening ✅

  • CI/CD (GitHub Actions — lint, test matrix, integration, lockfile)
  • Release pipeline (tag push → PyPI + GHCR + MCP registry)
  • uv lockfile (149 packages)
  • pre-commit hooks (ruff format + lint + lock check)
  • MCP bundle packaging (server.json, smithery.yaml, manifest.json)
  • HTML/SARIF reporting
  • IDE extensions (VSCode)
  • Regression detection and alerting

Future Work

Advanced Orchestration (Deferred)

  • Implicit region detection (spectral anomalies)
  • Non-contradictory framework assignment
  • Cross-framework survival analysis

Semantic Search & RAG

  • Code Graph RAG for semantic search
  • Semantic search index (ChromaDB)
  • RAG retrieval pipeline with graph expansion
  • Text-to-Cypher queries

Quick Start

# Clone and install (dev)
git clone https://github.com/egoughnour/curate-ipsum.git
cd curate-ipsum
uv sync --extra dev --extra verify --extra rag --extra graph --extra synthesis

# Run the MCP server
uv run curate-ipsum

# Or run tests
make test                     # fast suite (no Docker/model needed)
make test-all                 # including integration tests

Configuration

All configuration is via environment variables (see .env.example):

CURATE_IPSUM_GRAPH_BACKEND=sqlite   # or kuzu
MUTATION_TOOL_DATA_DIR=.mutation_tool_data
MUTATION_TOOL_LOG_LEVEL=INFO
CHROMA_HOST=                         # empty = in-process, or localhost:8000
EMBEDDING_MODEL=all-MiniLM-L6-v2

For the full service stack (ChromaDB + angr runner):

make docker-up-verify         # starts Chroma + angr via Docker Compose

Documentation

Planning & Design

Architecture

Reference

Key References

  • Alchourrón, Gärdenfors, Makinson (1985). On the Logic of Theory Change
  • Fiedler (1973). Algebraic Connectivity of Graphs
  • Kameda (1975). On the Vector Representation of Reachability in Planar Directed Graphs
  • Solar-Lezama (2008). Program Synthesis by Sketching (CEGIS)
  • Clarke et al. (2000). Counterexample-Guided Abstraction Refinement (CEGAR)

License

MIT License - see LICENSE


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

curate_ipsum-2.5.1.tar.gz (546.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

curate_ipsum-2.5.1-py3-none-any.whl (178.7 kB view details)

Uploaded Python 3

File details

Details for the file curate_ipsum-2.5.1.tar.gz.

File metadata

  • Download URL: curate_ipsum-2.5.1.tar.gz
  • Upload date:
  • Size: 546.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for curate_ipsum-2.5.1.tar.gz
Algorithm Hash digest
SHA256 7881e2985a841dcb376aded8df3ba25762399d5b41d9d2642edfd43807cb3d4d
MD5 1f735070119c4c3d39762c7530e8d225
BLAKE2b-256 c4b8ef57abc25b9ad2a1a24dd9fb85a82ad4d3e259aabf880cff44783f21bcc5

See more details on using hashes here.

Provenance

The following attestation bundles were made for curate_ipsum-2.5.1.tar.gz:

Publisher: tag-release.yml on egoughnour/curate-ipsum

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file curate_ipsum-2.5.1-py3-none-any.whl.

File metadata

  • Download URL: curate_ipsum-2.5.1-py3-none-any.whl
  • Upload date:
  • Size: 178.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for curate_ipsum-2.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bc1f447813a4c378838cb660360c1092b1f97b8f6caaecd706889842667863d6
MD5 ac39efea55c4155abd2bffc38ed82672
BLAKE2b-256 a512ba27bb5ba9dee2deaa039ff9b9a3f9174635abe23398fb411cb233980190

See more details on using hashes here.

Provenance

The following attestation bundles were made for curate_ipsum-2.5.1-py3-none-any.whl:

Publisher: tag-release.yml on egoughnour/curate-ipsum

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page