A self-evaluating MCP server that assesses UK commercial-lease tenant break clauses, grounds every claim to verbatim source text, and publishes its own measured hallucination rate.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ankurs08

These details have not been verified by PyPI

Project description

UK Break Clause Analyzer (MCP)

A self-evaluating MCP server that assesses whether a UK commercial-lease tenant break clause can actually be exercised — and publishes its own measured hallucination rate.

⚖️ Decision-support only — NOT legal advice. Built on a deliberately simplified, non-proprietary ruleset over synthetic data. A qualified solicitor must verify any real decision.

The £2m full stop

In 2012, a tenant served a valid break notice to walk away from a lease — but on the break date they hadn't paid one quarter's rent that had fallen due a few weeks earlier. The break failed. They were bound to the lease (and its rent) for years. No drama, no bad faith — just one unmet condition precedent that everyone missed until it was too late.

Break clauses are unforgiving like that. Whether a tenant can actually leave turns on a short checklist — notice served in time, notice served correctly, no rent arrears, vacant possession given — and getting any one wrong is catastrophic. It is exactly the kind of task you might hand to an LLM... if you could trust it not to confidently invent the answer.

This project is about earning that trust, and measuring it. It is not a clever parser. It is a reliability harness: every claim is grounded to verbatim source text or it isn't made, genuinely-ambiguous cases are routed to a human instead of guessed, and the whole thing ships with an eval that publishes how often it lies.

The edge

Grounded — every asserted condition is backed by a verbatim source span. If the system can't find the text, it returns NOT_FOUND; it never invents a quote. A deterministic gate slices the span out of the source, so it can't echo hallucinated text.
Calibrated — when the lease genuinely doesn't settle a point, the answer is AMBIGUOUS — human verify, not a coin-flip. Abstaining honestly is a feature.
Self-evaluating — a pytest harness scores extraction accuracy, citation faithfulness, hallucination rate, and calibration against 24 labelled cases.
Reasons + verifies — the LLM only proposes; deterministic code disposes (grounds every quote, does the date arithmetic, applies the vacant-possession legal test, aggregates the verdict).

The headline number

See report/report.md for the full eval (all four metrics, per-model comparison, confusion matrix, caught-hallucination examples).

The committed report is the heuristic baseline (it runs with no API key) — and it already tells the core story: the grounding gate drives ungrounded (fabricated) hallucinations to zero, while a non-reasoning baseline still misgrounds and never abstains on the genuinely-ambiguous cases. That gap is exactly what a calibrated LLM is meant to close:

uv run python scripts/run_eval.py --record   # measure claude-haiku-4-5 vs claude-sonnet-4-6

How it works

flowchart LR
    A["Lease + Background Facts"] --> T["MCP tools"]
    T --> L["LLM adapter<br/>extract + reason · temperature 0"]
    L -- "proposes verbatim quotes<br/>+ findings" --> G{"Grounding gate<br/>verbatim? else NOT_FOUND"}
    G -- "spans sliced from source" --> C["Deterministic core<br/>checklist · UK date math · VP legal test"]
    C --> R["Strict-precedence aggregate<br/>fail→INVALID · uncertain→AMBIGUOUS · else VALID"]
    R --> O["Assessment<br/>verdict + calibration + human-verify gates"]
    subgraph EVAL["Eval (the point)"]
      D["24 labelled cases"] --> H["harness"] --> M["4 metrics"] --> P["report.md + SVG"]
    end

The trust boundary is structural: the deterministic core/ package physically cannot import the llm/ package (enforced by a test). "The LLM proposes, deterministic code disposes" is a property of the codebase, not a discipline.

The four MCP tools (each does one thing)

Tool	What it does
`extract_break_clause`	Returns the break clause + its verbatim source span
`check_conditions`	The four-condition checklist: each pass / fail / uncertain, with grounded evidence
`find_citation`	Exact verbatim supporting text for a claim, or `NOT_FOUND`
`assess_validity`	Orchestrated verdict + calibration note + mandatory human-verify gates

Quickstart (clone to running in under 2 minutes)

# 1. Install uv (skip if you have it)
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Install deps (uv fetches its own Python 3.12)
uv sync

# 3. Run the test suite — proves the eval apparatus is correct
uv run pytest -q

# 4. Check the dataset (every gold span verbatim, every label coherent)
uv run python scripts/validate_dataset.py

# 5. Regenerate the eval report
uv run python scripts/run_eval.py        # heuristic baseline, no key needed

No ANTHROPIC_API_KEY is required for any of the above — the eval falls back to the heuristic baseline and is fully reproducible. Set the key (and --record) to measure the real Claude models.

Your Anthropic API key

The key is only needed for live LLM extraction (the eval --record step and the server's real mode). Everything else runs without one.

Never commit it. .env is git-ignored and the cassettes redact the x-api-key header.

For local commands, either export it or use a .env file:

export ANTHROPIC_API_KEY=sk-ant-…           # option 1: shell
# or
cp .env.example .env && $EDITOR .env          # option 2: .env, then:
uv run --env-file .env python scripts/run_eval.py --record

For an MCP client, put it in the server config's env block (below).

Run the MCP server

# Inspect it interactively (the official MCP Inspector)
npx @modelcontextprotocol/inspector uv run break-clause-analyzer

Claude Desktop — add to claude_desktop_config.json (use the absolute path to your clone so it runs from the project):

{
  "mcpServers": {
    "break-clause-analyzer": {
      "command": "uv",
      "args": ["run", "--directory", "/absolute/path/to/rubo-mcp", "break-clause-analyzer"],
      "env": { "ANTHROPIC_API_KEY": "sk-ant-…" }
    }
  }
}

Claude Code — one command:

claude mcp add break-clause-analyzer -e ANTHROPIC_API_KEY=sk-ant-… \
  -- uv run --directory /absolute/path/to/rubo-mcp break-clause-analyzer

Without a key the server still runs and responds — it uses the heuristic baseline and says so. Every tool response carries the decision-support disclaimer.

Reproducible evals (cassettes)

temperature=0 is not a determinism guarantee from the API, so reproducibility comes from recorded cassettes (VCR.py). scripts/run_eval.py --record records one cassette set per model with the x-api-key header redacted; re-running without --record replays them with no key in seconds. See eval/cassettes/README.md.

Layout

src/break_clause_analyzer/
  core/       # deterministic trust boundary (no network; cannot import llm/)
    grounding.py  dates.py  checklist.py  aggregate.py
  llm/        # the only network egress (Anthropic + heuristic fallback)
  pipeline.py # propose → gate → dispose orchestration
  server.py   # FastMCP server: the four tools
data/cases/   # 24 labelled synthetic case files (+ dataset README)
eval/         # harness, metrics, report generator, cassettes
docs/METHODOLOGY.md  # pre-registered metric definitions
report/       # generated eval report + SVG
.planning/    # the eval-first roadmap, requirements, and decision log

Methodology & honesty

The metric definitions are pre-registered in docs/METHODOLOGY.md before any model is run, so the headline number can't be defined after the fact to look good. The hallucination rate counts misgrounding and overconfidence — not just fabrication — precisely because a grounding gate makes fabrication trivially zero. The scorer uses no LLM judge; it is validated against a gold oracle and deliberately-broken systems in tests/test_harness.py.

Scope (deliberately narrow)

Tenant break clauses only · four conditions precedent · synthetic/public data only · decision-support, not legal advice. Other lease provisions, landlord breaks, real client data, and a production engine are explicitly out of scope.

Built as a reliability-engineering artifact. The eval is the point.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ankurs08

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Jun 28, 2026

0.1.0

Jun 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uk_break_clause_analyzer_mcp-0.1.1.tar.gz (215.9 kB view details)

Uploaded Jun 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

uk_break_clause_analyzer_mcp-0.1.1-py3-none-any.whl (26.5 kB view details)

Uploaded Jun 28, 2026 Python 3

File details

Details for the file uk_break_clause_analyzer_mcp-0.1.1.tar.gz.

File metadata

Download URL: uk_break_clause_analyzer_mcp-0.1.1.tar.gz
Upload date: Jun 28, 2026
Size: 215.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for uk_break_clause_analyzer_mcp-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`5fd0ab9d6a641675fc7fd8465f239aeec5f0e27f1eea0d17b1d48f9f5a8e596a`
MD5	`2a1d6c0c389427862966f9a5e7537e04`
BLAKE2b-256	`cc9b68f58a3bc694d6399d9bc3578ef9bcc88ff48893efcfc40938171ba3aa0d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for uk_break_clause_analyzer_mcp-0.1.1.tar.gz:

Publisher: publish.yml on Ankur-stockheads/rubo-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: uk_break_clause_analyzer_mcp-0.1.1.tar.gz
- Subject digest: 5fd0ab9d6a641675fc7fd8465f239aeec5f0e27f1eea0d17b1d48f9f5a8e596a
- Sigstore transparency entry: 1989678630
- Sigstore integration time: Jun 28, 2026
Source repository:
- Permalink: Ankur-stockheads/rubo-mcp@9c7d3e68c392309d39b5ea507a29d6fd7e409d01
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/Ankur-stockheads
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@9c7d3e68c392309d39b5ea507a29d6fd7e409d01
- Trigger Event: release

File details

Details for the file uk_break_clause_analyzer_mcp-0.1.1-py3-none-any.whl.

File metadata

Download URL: uk_break_clause_analyzer_mcp-0.1.1-py3-none-any.whl
Upload date: Jun 28, 2026
Size: 26.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for uk_break_clause_analyzer_mcp-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f2fda6c8f0f2fb6b9faf3d998b0268ffe893e449b8b109eb0308141290a5ea61`
MD5	`ce3bda95dc68184c871f02339f4d1370`
BLAKE2b-256	`c8ed25229adbd16a24e6b1cccd6bf3947a969d26f072877ff75236feb8877fcb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for uk_break_clause_analyzer_mcp-0.1.1-py3-none-any.whl:

Publisher: publish.yml on Ankur-stockheads/rubo-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: uk_break_clause_analyzer_mcp-0.1.1-py3-none-any.whl
- Subject digest: f2fda6c8f0f2fb6b9faf3d998b0268ffe893e449b8b109eb0308141290a5ea61
- Sigstore transparency entry: 1989678845
- Sigstore integration time: Jun 28, 2026
Source repository:
- Permalink: Ankur-stockheads/rubo-mcp@9c7d3e68c392309d39b5ea507a29d6fd7e409d01
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/Ankur-stockheads
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@9c7d3e68c392309d39b5ea507a29d6fd7e409d01
- Trigger Event: release

uk-break-clause-analyzer-mcp 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

UK Break Clause Analyzer (MCP)

The £2m full stop

The edge

The headline number

How it works

The four MCP tools (each does one thing)

Quickstart (clone to running in under 2 minutes)

Your Anthropic API key

Run the MCP server

Reproducible evals (cassettes)

Layout

Methodology & honesty

Scope (deliberately narrow)

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance