Context-isolated verification harness for AI-generated code

These details have not been verified by PyPI

Project description

CrossReview

Automated cross-review for AI coding — same model, clean session, independent second pass on your output.

What is Cross-Review?

In human code review, a change is typically inspected by someone who did not directly implement it, which reduces author bias. CrossReview applies the same principle to AI-generated code by separating generation and review into two isolated contexts.

An AI coding assistant (Claude, Copilot, Cursor, etc.) first produces the change in its original session. CrossReview then packages the diff, stated intent, focus areas, and optional context into a ReviewPack and hands it to a separate reviewer session for verification. That reviewer does not inherit the original conversation, reasoning trace, or tool history; it evaluates the change only from the minimum necessary inputs.

The key insight: you don't need a different model, just a different context. Same model, clean session, real findings.

Why It Works

The mechanism is not model diversity; it is input isolation.

The author session accumulates local assumptions, discarded alternatives, retries, and tool-side trial-and-error. If the review step reuses that context, the reviewer is likely to preserve the author's framing instead of independently re-deriving whether the change is correct.

CrossReview avoids that by constraining reviewer input to the review artifact itself:

Reviewer receives	Reviewer does not receive
Diff / changed files	Original conversation
Stated intent	Planning or reasoning trace
Focus areas	Tool call history
Optional context files	Retries, failed attempts, intermediate drafts

This separation has two practical effects:

It increases reviewer independence, because the second pass must justify findings from the artifact rather than from inherited session state.
It improves auditability, because reviewer claims can be checked against ReviewPack contents, emitted findings, and deterministic normalization rules.

Eval Results

Full evaluation across 33 fixtures (claude-opus-4.6, external_only scope):

Metric	Value	Gate
Precision	0.885	≥ 0.70 ✅
Recall	0.929	≥ 0.80 ✅
Unclear rate	0.133	≤ 0.150 ✅
Invalid findings / run	1	≤ 2 ✅

All 9 release gate metrics pass — blocking_pass: true. See v0-scope.md §12 for the full gate definition.

Quick Start

pip install crossreview              # from PyPI (v0.1.0a2+)
pip install -e .                     # local dev (pack + verify commands)
pip install -e '.[anthropic]'        # + Anthropic standalone reviewer backend
pip install -e '.[dev]'              # dev dependencies (pytest + ruff)

# configure standalone verify via flags, crossreview.yaml, or env vars
# example:
#   export CROSSREVIEW_PROVIDER=anthropic
#   export CROSSREVIEW_MODEL=claude-sonnet-4-20250514
#   export CROSSREVIEW_API_KEY_ENV=ANTHROPIC_API_KEY
#   export ANTHROPIC_API_KEY=...

crossreview pack --diff HEAD~1 --intent "fix auth token refresh" > pack.json
crossreview pack --staged --intent "fix auth token refresh" > pack.json
crossreview verify --pack pack.json

Or in one step:

crossreview verify --diff HEAD~1 --intent "fix auth token refresh"
crossreview verify --staged --intent "fix auth token refresh"

crossreview verify --diff, --staged, and --unstaged output human-readable text by default. crossreview verify --pack outputs ReviewResult JSON (default), or human-readable text with --format human:

{
  "schema_version": "0.1-alpha",
  "artifact_fingerprint": "diff:abc123",
  "pack_fingerprint": "pack:def456",
  "review_status": "complete",
  "intent_coverage": "covered",
  "findings": [
    {
      "id": "f-001",
      "severity": "high",
      "summary": "Token refresh silently succeeds when refresh_token is expired",
      "detail": "The try/except on line 42 catches TokenExpiredError but returns the old token instead of raising.",
      "category": "logic_error",
      "locatability": "exact",
      "confidence": "plausible",
      "evidence_related_file": false,
      "actionable": true,
      "file": "src/auth.py",
      "line": 42
    }
  ],
  "advisory_verdict": {
    "verdict": "concerns",
    "rationale": "review found medium/high-severity issues"
  },
  "quality_metrics": {
    "pack_completeness": 0.85,
    "noise_count": 0,
    "raw_findings_count": 1,
    "emitted_findings_count": 1,
    "locatability_distribution": {
      "exact_pct": 1.0,
      "file_only_pct": 0.0,
      "none_pct": 0.0
    },
    "speculative_ratio": 0.0
  },
  "reviewer": {
    "type": "fresh_llm",
    "model": "claude-sonnet-4-20250514",
    "session_isolated": true,
    "failure_reason": null,
    "prompt_source": "product",
    "prompt_version": "v0.1"
  },
  "budget": {
    "status": "complete",
    "files_reviewed": 1,
    "files_total": 1,
    "chars_consumed": 842,
    "chars_limit": 12000
  }
}

Architecture

         git diff + intent + focus + context
                      │
                      ▼
              ┌────────────────┐
              │      Pack      │  Assemble ReviewPack
              └───────┬────────┘
                      │
                      ▼
              ┌────────────────┐
              │  Budget Gate   │  Focus-priority, size cap
              └───────┬────────┘
                      │
   ╔══════════════════╪═══════════════════════════╗
   ║                  ▼  Isolation Boundary       ║
   ║          ┌────────────────┐                  ║
   ║          │ Reviewer (LLM) │  Fresh session,  ║
   ║          │                │  zero shared ctx ║
   ║          └───────┬────────┘                  ║
   ╚══════════════════╪═══════════════════════════╝
                      │
                      ▼
              ┌────────────────┐
              │  Normalizer    │  Extract findings from text
              └───────┬────────┘
                      │
                      ▼
              ┌────────────────┐
              │  Adjudicator   │  Apply rules → verdict
              └───────┬────────┘
                      │
                      ▼
              ┌────────────────┐
              │ ReviewResult   │  Findings + verdict
              │ (JSON)         │  + quality metrics
              └────────────────┘

Only the Reviewer calls an LLM. Everything else is rule-based — no AI in the loop.

Two reviewer backend modes:

Mode	Description	Dependency
Host-integrated (CLI implemented)	The host renders the reviewer prompt in an isolated context (fresh session / sub-agent), then feeds raw analysis back to CrossReview's normalizer + adjudicator through the `render-prompt + ingest` CLI commands	No extra SDK on the CrossReview side
Standalone (implemented)	CLI calls the LLM API directly	`crossreview[anthropic]` + reviewer config + API key

Host-integrated is the planned default product path. The host does NOT need to implement a Python ReviewerBackend; the integration path is render-prompt + ingest, with the host responsible for executing the canonical prompt in a fresh context and feeding raw analysis back.

Commands

`crossreview pack`

crossreview pack --diff HEAD~1 > pack.json
crossreview pack --diff main..feat --intent "add caching" --focus cache --context ./plan.md > pack.json

Flag	Description
`--diff REF`	Git ref (`HEAD~1`) or range (`main..feat`)
`--intent TEXT`	Task intent (background claim, not ground truth)
`--task FILE`	Full task description file
`--focus TERM`	Focus review area (repeatable)
`--context FILE`	Extra context file (repeatable)

`crossreview verify`

Two modes: --pack (verify a pre-built ReviewPack) or --diff (one-stop: pack + verify).

# one-stop: pack + verify, human output by default
crossreview verify --diff HEAD~1
crossreview verify --diff HEAD~1 --intent "fix auth" --focus auth

# verify a pre-built pack, JSON output by default
crossreview verify --pack pack.json
crossreview verify --pack pack.json --model claude-sonnet-4-20250514 --provider anthropic

crossreview verify requires reviewer configuration to resolve successfully:

--model / --provider / --api-key-env
or crossreview.yaml
or ~/.crossreview/config.yaml
or CROSSREVIEW_MODEL / CROSSREVIEW_PROVIDER / CROSSREVIEW_API_KEY_ENV

Flag	Description
`--diff REF`	Git ref for diff (e.g. `HEAD~1`, `main..feat`). Assembles ReviewPack inline. Mutually exclusive with `--pack`
`--pack FILE`	Path to ReviewPack JSON. Mutually exclusive with `--diff`
`--intent TEXT`	Task intent string (--diff mode)
`--task FILE`	Task description file (--diff mode)
`--focus TERM`	Focus area, repeatable (--diff mode)
`--context FILE`	Extra context file, repeatable (--diff mode)
`--format FORMAT`	Output format. Defaults to `human` with `--diff`, `json` with `--pack`
`--model TEXT`	Override reviewer model
`--provider TEXT`	Override provider (currently `anthropic` only)
`--api-key-env VAR`	Override API key env variable name

`crossreview render-prompt`

crossreview render-prompt --pack pack.json > prompt.md
crossreview render-prompt --pack pack.json --template custom-template.md > prompt.md

Renders a ReviewPack into the full canonical reviewer prompt for the host to execute in an isolated context. No LLM call, no API key needed.

Flag	Description
`--pack FILE`	Path to ReviewPack JSON
`--template FILE`	Custom prompt template (default: built-in product/v0.1)

`crossreview ingest`

crossreview ingest --raw-analysis raw.md --pack pack.json --model claude-sonnet-4-20250514
crossreview ingest --raw-analysis - --pack pack.json --model host_unknown --prompt-source product --prompt-version v0.1

Takes raw analysis text from a host-integrated review session and produces a standard ReviewResult via normalizer + adjudicator. No LLM call, no API key needed. Outputs JSON by default; use --format human for terminal-friendly output.

Flag	Description
`--raw-analysis FILE`	Raw analysis file path; `-` for stdin
`--pack FILE`	Original ReviewPack JSON
`--model TEXT`	Host model name (`host_unknown` if unknown)
`--format FORMAT`	Output format: `json` (default) or `human`
`--prompt-source TEXT`	Prompt source identifier (optional)
`--prompt-version TEXT`	Prompt version identifier (optional)
`--latency-sec FLOAT`	Host-measured LLM latency (optional)
`--input-tokens INT`	Host-reported input token count (optional)
`--output-tokens INT`	Host-reported output token count (optional)

Exit Codes

All commands return 0 when a ReviewResult is successfully produced, regardless of review_status or advisory_verdict. A non-zero exit code means the command failed to produce output (invalid input, missing API key, empty diff, etc.).

For automation, check review_status and advisory_verdict in the JSON output instead of relying on the exit code:

crossreview verify --diff HEAD~1 --format json | jq -e '.advisory_verdict.verdict == "pass_candidate"'

Status

Component	Status	Notes
Schema	✅ Done	ReviewPack / Finding / ReviewResult / Config
Pack CLI	✅ Done	`crossreview pack`
Budget Gate	✅ Done	Focus priority + soft/hard truncation
Reviewer	✅ Done	ReviewerBackend protocol + Anthropic standalone
Normalizer	✅ Done	Rule-based finding extraction
Adjudicator	✅ Done	Rule-based advisory verdict
Verify CLI	✅ Done	`crossreview verify --pack`
Render Prompt CLI	✅ Done	`crossreview render-prompt --pack` (host-integrated front half)
Ingest CLI	✅ Done	`crossreview ingest --raw-analysis --pack --model` (host-integrated back half)
Evidence Collector	🔜 Next	ReviewPack.evidence path exists, empty evidence works
Eval Harness	✅ Done	33 fixtures, 9/9 gate metrics pass, `blocking_pass: true`
Human-readable Output	✅ Done	`--format human` on verify/ingest
One-stop Verify	✅ Done	`crossreview verify --diff` (pack + review in one step, default `--format human`)

v0 Scope

Supported: code_diff artifact only · advisory verdict · single fresh_llm reviewer · deterministic adjudicator and normalizer (no LLM fallback)

Out of scope (v0): Python SDK · MCP Server · CI/CD Action · Agent Skill runtime mode (advisory SKILL.md provided; runtime bridge deferred) · cross-model reviewer · verdict = block

Release gate: v0 must pass 9 blocking metrics (§12), including manual_recall ≥ 0.80, precision ≥ 0.70, fixture_count ≥ 20, invalid_findings_per_run ≤ 2, and 5 others. All 9 currently pass (blocking_pass: true).

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.0a3 pre-release

Apr 30, 2026

This version

0.1.0a2 pre-release

Apr 30, 2026

0.1.0a1 pre-release

Apr 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crossreview-0.1.0a2.tar.gz (66.7 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crossreview-0.1.0a2-py3-none-any.whl (40.4 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file crossreview-0.1.0a2.tar.gz.

File metadata

Download URL: crossreview-0.1.0a2.tar.gz
Upload date: Apr 30, 2026
Size: 66.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for crossreview-0.1.0a2.tar.gz
Algorithm	Hash digest
SHA256	`33899bafc8bfbf950171fd1895af613279295eeee324b3acd1ace46ff4019593`
MD5	`85fa1105bd9f535d29c0be223b04252c`
BLAKE2b-256	`52557d32cfc7b500544cce782b98d781b214e3ae9818271ef48607d4edf98053`

See more details on using hashes here.

Provenance

The following attestation bundles were made for crossreview-0.1.0a2.tar.gz:

Publisher: publish.yml on evidentloop/cross-review

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: crossreview-0.1.0a2.tar.gz
- Subject digest: 33899bafc8bfbf950171fd1895af613279295eeee324b3acd1ace46ff4019593
- Sigstore transparency entry: 1409284729
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: evidentloop/cross-review@70af50fac74d6ce8296fe9e52dc366e53bfe8da6
- Branch / Tag: refs/tags/v0.1.0a2
- Owner: https://github.com/evidentloop
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@70af50fac74d6ce8296fe9e52dc366e53bfe8da6
- Trigger Event: push

File details

Details for the file crossreview-0.1.0a2-py3-none-any.whl.

File metadata

Download URL: crossreview-0.1.0a2-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 40.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for crossreview-0.1.0a2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd908d5170109545bc61060eb33b4c07070836afc8bd26dadda4e2a64600c772`
MD5	`3b726efea48355e157dd8f9d4bd64df4`
BLAKE2b-256	`0ff2f973fbed111686222d7d8990195a09120bfe5a7fab45d03c76869d0f8f0a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for crossreview-0.1.0a2-py3-none-any.whl:

Publisher: publish.yml on evidentloop/cross-review

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: crossreview-0.1.0a2-py3-none-any.whl
- Subject digest: fd908d5170109545bc61060eb33b4c07070836afc8bd26dadda4e2a64600c772
- Sigstore transparency entry: 1409284738
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: evidentloop/cross-review@70af50fac74d6ce8296fe9e52dc366e53bfe8da6
- Branch / Tag: refs/tags/v0.1.0a2
- Owner: https://github.com/evidentloop
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@70af50fac74d6ce8296fe9e52dc366e53bfe8da6
- Trigger Event: push

crossreview 0.1.0a2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

CrossReview

What is Cross-Review?

Why It Works

Eval Results

Quick Start

Architecture

Commands

`crossreview pack`

`crossreview verify`

`crossreview render-prompt`

`crossreview ingest`

Exit Codes

Status

v0 Scope

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance