Skip to main content

LLM-optimised test output for pytest -- structured JSONL, deduplicated warnings, adaptive token budgets, and failure clustering via Claude Code.

Project description

pytest-verdict

LLM-optimised test output for pytest. Structured JSONL, deduplicated warnings, adaptive token budgets, and automatic failure clustering via psclaude.

The idea

Test output is designed for humans scanning terminals. LLM agents consuming test results must reverse-parse decorative formatting back into structured data. This wastes tokens, introduces parsing errors, and conflates root causes with their symptoms across duplicated tracebacks.

pytest-verdict fixes this with a two-phase architecture:

  • Phase 1 (deterministic): a pytest plugin extracts structured JSONL from TestReport objects -- no formatting, no decoration, no information loss
  • Phase 2 (LLM clustering): sends failures to Claude Code via psclaude for semantic root-cause clustering -- grouping failures by cause, not by string similarity

The clustering runs with bundled extensions -- a domain-specific CLAUDE.md and three skills that teach Claude how to analyse test failures, cluster by root cause, and prioritise fixes.

Install

# Phase 1 only (structured JSONL output)
pip install jaymd96-pytest-verdict

# Phase 1 + Phase 2 (with psclaude for clustering)
pip install jaymd96-pytest-verdict[cluster]

Usage

One command -- full pipeline

# Run tests with automatic failure clustering
pytest --cluster

# With output files for CI integration
pytest --cluster --agent-output report.jsonl --cluster-output clusters.txt

If Claude Code is installed, you get a clustered failure report. If not, you get a clear message and the raw structured JSONL.

Phase 1 only -- structured JSONL

# Emit structured JSONL without clustering
pytest --agent-json --agent-output report.jsonl

# Control verbosity with token budget
pytest --agent-json --agent-output report.jsonl --token-budget 500

Standalone clustering

# Check if Claude Code is available
python -m pytest_verdict.cluster --check

# Cluster a previously-generated JSONL file
python -m pytest_verdict.cluster report.jsonl

In CLAUDE.md for Claude Code projects

When running tests, use:
  pytest --cluster --agent-output /tmp/test-report.jsonl --cluster-output /tmp/clusters.txt
Read /tmp/clusters.txt for failure analysis. If clustering fails, read /tmp/test-report.jsonl directly.

How it works

pytest run
    |
    v
Phase 1: pytest plugin hooks into TestReport objects
    |  - Extracts structured exception/assertion data
    |  - Deduplicates warnings with occurrence counts
    |  - Writes JSONL (verdict-first, one line per failure)
    |
    v
Phase 2: psclaude client (if --cluster and Claude Code is installed)
    |  - Creates isolated workspace with bundled extensions
    |  - Loads CLAUDE.md (test analysis domain instructions)
    |  - Loads skills: cluster-failures, diagnose-failure, prioritise-fixes
    |  - Sends structured failures as JSON (not terminal output)
    |  - Receives clustered root-cause analysis
    |  - Falls back to raw JSONL if unavailable
    |
    v
Output: compact cluster report + raw JSONL

Bundled extensions

The package ships with psclaude extensions that configure Claude for test failure analysis:

Extension Purpose
extensions/CLAUDE.md Domain instructions: input format, key fields for root-cause analysis, reasoning principles, output requirements
extensions/skills/cluster-failures.md Skill for grouping failures by root cause with structured JSON output
extensions/skills/diagnose-failure.md Skill for in-depth analysis of a single failure: test bug vs code bug, execution tracing, fix suggestion
extensions/skills/prioritise-fixes.md Skill for ranking fix order: impact-per-effort scoring, dependency detection, ordered fix plan

These are loaded automatically when --cluster is used. You can override them by passing custom paths to cluster_failures() programmatically.

Output schema

JSONL (Phase 1)

Every line is valid JSON. Line 1 is always the summary.

{"type": "summary", "verdict": "FAIL", "counts": {"passed": 42, "failed": 3, "skipped": 5}, "duration_s": 1.23}
{"type": "failure", "node_id": "tests/test_auth.py::test_validate_token", "exception": {"type": "AssertionError", "message": "..."}, "assertion": {"left": "'expired'", "comparator": "==", "right": "'valid'"}, "traceback": [...]}
{"type": "warnings", "unique_count": 1, "total_count": 12, "items": [{"category": "DeprecationWarning", "message": "...", "count": 12}]}

Cluster report (Phase 2)

VERDICT: FAIL | 6 passed, 4 failed, 1 skipped | 0.22s

3 failure cluster(s):

  [incorrect/localised] validate_token() returns "expired" for expired tokens
    fix -> examples/test_demo.py::test_expired_token_returns_valid
    tests (2): test_expired_token_returns_valid, test_token_status_is_not_expired
    evidence: Both call validate_token("expired-abc"), function is correct, tests are wrong

  [incorrect/one-liner] calculate_discount() missing "platinum" tier in rates dict
    fix -> examples/test_demo.py::calculate_discount
    tests (1): test_platinum_discount

  [incorrect/one-liner] parse_config() raises ValueError on empty input
    fix -> examples/test_demo.py::parse_config
    tests (1): test_parse_empty_config

Token budget tiers

Flag Includes ~Tokens (4 failures)
--token-budget 0 (default) Full output -- all tracebacks, all warnings ~840
--token-budget 2000 Truncated tracebacks (assertion + source-under-test only) ~500
--token-budget 500 Failure one-liners (node_id + exception) ~182
--token-budget 100 Verdict line only ~57

CLI flags

Flag Description
--agent-json Enable structured JSONL output
--agent-output PATH Write JSONL to file (default: stderr)
--cluster Cluster failures via psclaude (implies --agent-json)
--cluster-output PATH Write cluster report to file (default: stderr)
--cluster-timeout N Max seconds for Claude Code response (default: 120)
--token-budget N Control JSONL verbosity (0 = unlimited)

Requirements

  • Python >= 3.11
  • pytest >= 7.0
  • jaymd96-psclaude >= 0.1.0 (optional, for --cluster)
  • Claude Code CLI (optional, for --cluster): npm install -g @anthropic-ai/claude-code

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jaymd96_pytest_verdict-0.1.0.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jaymd96_pytest_verdict-0.1.0-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file jaymd96_pytest_verdict-0.1.0.tar.gz.

File metadata

  • Download URL: jaymd96_pytest_verdict-0.1.0.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.13.12 HTTPX/0.28.1

File hashes

Hashes for jaymd96_pytest_verdict-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9151876cb6070f0c1bc28c0450b7469843411609ecb0fb447699b9bf90b297d4
MD5 aa668d35b63ef35663fd17f118c4ad3f
BLAKE2b-256 954f4dce9a145aafa4035aab02c576156f04f3c96ea086324a78351a2ca1c4e4

See more details on using hashes here.

File details

Details for the file jaymd96_pytest_verdict-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for jaymd96_pytest_verdict-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 27a0e74a8795a7c4dbf336bbfb3a12d264bc1b6fc33057635e0ef0639f8d80da
MD5 7e8ea35f363c982e5b7ba57f4cbb2fe2
BLAKE2b-256 68cb699e363925a2f391be663fc75ea8c4ca53faca29fb53cf0c83972ffc67e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page