Local-first failure diagnosis for AI browser automation, Playwright, crawler, and RPA runs.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

tobylsd

These details have not been verified by PyPI

Project description

Agent Failure Doctor

中文文档

License: MIT Python 3.10+

Local-first failure diagnosis lifecycle tool for AI browser automation, Playwright, crawler, RPA, and business automation failures.

Current milestone: Agent Failure Doctor v3.2 Auto Collector P98 Gate
Previous stable line: Agent Failure Doctor v3.1.0 P98 Master Gate.
Previous P95 stable line: Agent Failure Doctor v2.4.1 P95 Alignment & Missing Tracks Pack.

Input: trace.zip / error.log / console.txt / network.json / screenshot metadata / user_description.txt

Output: diagnosis, evidence, next action, repair suggestions, GitHub issue draft, Codex fix prompt.

Quickstart

git clone https://github.com/tobybgy-lsd/web-agent-runtime-bench.git
cd web-agent-runtime-bench
python -m pip install -e .
failure-doctor diagnose .\examples\failed_runs\proxy_network_error --out .\report
failure-doctor plan .\report --out .\fix_plan
failure-doctor collect --project . --preset auto --out .\failure_doctor_auto_report `
  --auto-diagnose --auto-handoff --auto-sanitize
failure-doctor agent-bootstrap --target all --project .

See validation/dashboard.md, docs/P98_LIMITS.md, docs/AGENT_FRONTEND_INVOCATION.md, and docs/safety_boundary.md.

P98 master gate passed with the auto collector pillar included.

Advanced commands include failure-doctor handoff, failure-doctor agent-bootstrap, failure-doctor propose-patch, and failure-doctor batch.

Core commands: collect / diagnose / plan / verify / run / watch / sanitize / adapt / handoff / agent-bootstrap / propose-patch / batch

Classic lifecycle: diagnose / plan / verify / run / sanitize / adapt -> diagnose -> plan -> AI handoff / patch proposal -> verify -> sanitize/share

P98 gate: knowledge base -> coverage matrix -> trace/cross-framework/training/composite/handoff/batch/sanitize/auto-collector -> master gate

Distribution & Feedback

v3.2.0 is the current stable technical baseline. The next phase is distribution and real user feedback, not more synthetic feature expansion.

PyPI release runbook: docs/PYPI_RELEASE.md
2-minute demo script: docs/DEMO_VIDEO_SCRIPT.md
Technical article draft: docs/TECH_ARTICLE_DRAFT.md
Real user feedback loop: docs/REAL_USER_FEEDBACK_LOOP.md

After PyPI publication, the target install command is:

pip install agent-failure-doctor

For non-technical Windows users, double-click scripts/windows/Start-FailureDoctor-Diagnosis.bat or drag a failed project folder onto it.

Advanced v3.2 commands include failure-doctor collect and failure-doctor watch.

Agent frontend invocation:

failure-doctor agent-bootstrap --target all --project .

This writes .failure-doctor/AGENT_ENTRYPOINT.md plus Codex, Cursor, Claude Code, VS Code/Copilot, Antigravity, OpenCode, Qoder, Trae, WorkBuddy, OpenClaw, Hermes, and generic agent workflow instructions.

Agent Failure Doctor uses a deterministic evidence-based diagnostic engine. It does not claim to solve arbitrary failures, but it provides explainable classification, evidence, fix plans, and before/after verification for known automation failure patterns.

Applied scenario demos are local-only mock workflows for commerce automation, live monitoring, content publishing, GUI data bridge, and ERP sync failure diagnosis.

Spiderbuf-inspired challenge demos are local-only mock failure packs inspired by public crawler-training challenge categories; they validate diagnosis and safe next actions without accessing spiderbuf.cn or publishing private solution logic.

Integration commands: failure-doctor collect-playwright / failure-doctor pack-logs / failure-doctor adapt

What You Get

report/
|-- diagnosis.json
|-- diagnosis.md
|-- evidence.json
|-- input_summary.json
|-- issue_draft.md
|-- repair_suggestions.md
|-- codex_fix_prompt.md
`-- failure_doctor_report.zip

Agent Failure Doctor turns sanitized automation failure materials into a report that explains what likely failed, what evidence supports the diagnosis, what evidence is missing, and what to ask Codex or another coding assistant to change next.

One-Minute Start

Auto Capture:

failure-doctor run -- python crawler.py
failure-doctor run -- pytest tests/test_listing.py
failure-doctor run -- playwright test

This writes a local run folder under .failure-doctor/runs/<run_id>/:

.failure-doctor/runs/<run_id>/
|-- command.txt
|-- exit_code.txt
|-- stdout.log
|-- stderr.log
|-- environment.json
|-- detected_artifacts.json
|-- input_summary.json
|-- diagnosis/
|-- fix_plan/
|-- verification_hint.md
`-- shareable_failure_pack.zip

The generated safe_to_share.json defaults to safe_to_share=false; review and sanitize before sending a pack to anyone else.

Sanitize & Share Pack:

Sanitize a failed run before sharing it:

failure-doctor sanitize .\.failure-doctor\runs\<run_id> --out .\shareable_failure_pack

This writes redacted logs, redacted network summaries, trace metadata only, a redaction report, a review gate, and shareable_failure_pack.zip.

Raw trace.zip archives are not copied into the sanitized pack.

Put a failed run in a folder:

my_failed_run/
|-- error.log
|-- console.txt
|-- network.json
|-- README.txt
`-- screenshot.png

Then run:

failure-doctor diagnose .\my_failed_run --out .\report

The tool inventories inputs and uses this evidence priority:

trace.zip > log > network.json > user description > screenshot metadata

When evidence is too thin, it should downgrade to insufficient_evidence instead of guessing.

Minimal Demos

Proxy/network failure:

failure-doctor diagnose .\examples\failed_runs\proxy_failed --out .\report_proxy

Strict mode locator conflict:

failure-doctor diagnose .\examples\failed_runs\strict_mode_locator --out .\report_locator

Low-evidence screenshot-only run:

failure-doctor diagnose .\examples\failed_runs\low_evidence_screenshot_only --out .\report_low_evidence

Native Playwright trace fixture:

trace-doctor diagnose .\examples\realistic_playwright_traces\02_login_redirect_302\trace.zip --out .\report_login_trace

Before / After Report

Report structure: conclusion / evidence / why / next action / Codex fix prompt

Before:

page.goto: net::ERR_PROXY_CONNECTION_FAILED while opening https://example.test

After:

Conclusion: network/proxy setup failed before the page loaded.
Evidence: Playwright reported net::ERR_PROXY_CONNECTION_FAILED.
Next action: check proxy settings, DNS, VPN, and CI network configuration.
Codex fix prompt: add trace/log capture and make proxy configuration explicit.

Verify a Fix

failure-doctor diagnose .\failed_run --out .\report
failure-doctor plan .\report --out .\fix_plan
failure-doctor verify --before .\failed_run --after .\rerun_after_fix --out .\verification_report

verify compares before/after evidence and reports whether the original failure is resolved, unchanged, changed into another failure, or insufficiently evidenced.

AI Handoff & Patch Proposal

Turn a report into task packs that Codex, Claude Code, or Cursor can execute:

failure-doctor handoff .\report --target codex --out .\ai_handoff
failure-doctor handoff .\report --target claude_code --out .\ai_handoff
failure-doctor handoff .\report --target cursor --out .\ai_handoff

This writes:

ai_handoff/
|-- ai_handoff.json
|-- ai_handoff.md
|-- codex_task.md
|-- claude_code_task.md
|-- cursor_task.md
|-- affected_files.json
|-- validation_commands.md
|-- forbidden_actions.md
|-- token_budget_report.json
`-- ai_handoff_pack.zip

Generate a dry-run patch proposal without modifying source code:

failure-doctor propose-patch --repo . --report .\report --out .\patch_plan

This writes:

patch_plan/
|-- patch_proposal.md
|-- proposed_changes.json
|-- affected_files.json
|-- validation_commands.md
`-- patch_risk_assessment.json

propose-patch is intentionally proposal-only. It does not edit files, apply patches, run tests, or open pull requests.

v2.5 validation writes validation/ai_handoff_validation.json:

20/20 Codex task files generated
20/20 Claude Code task files generated
20/20 Cursor task files generated
18/20 patch proposals generated
20/20 required sections present
20/20 concise token budget checks pass
0 forbidden outputs

Batch Diagnosis / Fleet Mode

Diagnose many failed runs and get a fleet-level summary:

failure-doctor batch .\runs --out .\batch_report

Input:

runs/
|-- run_001/
|-- run_002/
|-- run_003/
`-- ...

Output:

batch_report/
|-- summary.json
|-- summary.md
|-- failures_by_type.csv
|-- top_root_causes.md
|-- repeated_failures.md
|-- suggested_regression_cases.md
|-- repair_priority.md
`-- reports/

Fleet mode answers which failures repeat, which root causes dominate, which runs should become regression cases, and which fixes deserve priority.

P98 Controlled Maturity

v3.0 starts the P98 controlled maturity track. This is not an ecosystem score; it does not count stars, external PRs, external issues, PyPI downloads, or long-term community adoption.

Current P98 assets:

Knowledge-base commands:

python -m tools.knowledge_base.validate_patterns
python -m tools.knowledge_base.search_patterns --query selector_drift
python -m tools.validation.run_crawler_failure_coverage_matrix

Applied Scenario Demos

Local-only mock demos show how Agent Failure Doctor can diagnose failures in:

hot product collection
live commerce monitoring
ecommerce listing automation
authorized content publishing workflow
GUI / RPA data bridge
ERP-to-ecommerce sync

Run:

python -m tools.validation.run_applied_scenario_validation

Spiderbuf-Inspired Challenge Demos

examples/spiderbuf_inspired_challenges/ contains local-only mock failure packs inspired by public crawler-training challenge categories:

cookie/session required
iframe extraction
Ajax dynamic loading
random CSS selector drift
infinite scroll missing items
rate limit 429
API signature required
browser fingerprint risk
Selenium detection risk
challenge page detected

These cases are diagnosis-only. They do not access spiderbuf.cn, do not include private solutions, and do not include access-control defeat steps.

python -m tools.validation.run_spiderbuf_inspired_validation

Integrations

Collect Playwright test-results into a failure pack:

failure-doctor collect-playwright .\examples\mock_playwright_test_results --out .\tmp_failure_pack
failure-doctor diagnose .\tmp_failure_pack --out .\tmp_collected_report

Normalize a loose log folder:

failure-doctor pack-logs .\examples\mock_raw_logs --out .\tmp_log_pack
failure-doctor diagnose .\tmp_log_pack --out .\tmp_log_report

Normalize a Selenium, Puppeteer, Cypress, Scrapy, requests, or httpx failure log:

failure-doctor adapt .\examples\cross_framework_fixtures\selenium\no_such_element\raw --framework selenium --out .\tmp_selenium_pack
failure-doctor diagnose .\tmp_selenium_pack --out .\tmp_selenium_report
failure-doctor plan .\tmp_selenium_report --out .\tmp_selenium_fix_plan

Supported adapter frameworks:

selenium | puppeteer | cypress | scrapy | requests | httpx | auto

Playwright remains the deepest native trace backend. Cross-framework adapters normalize local logs and metadata into the same failure lifecycle; they do not run those frameworks or connect to external platforms.

See docs/INTEGRATIONS.md and docs/GITHUB_ACTION_USAGE.md.

Validation Status

Current milestone: Agent Failure Doctor v3.2 Auto Collector P98 Gate.

Previous stable line: Agent Failure Doctor v2.4.1 P95 Alignment & Missing Tracks Pack.

131 source-ledger records with separated real_public_issue, official_doc_pattern, and public_inspired_sanitized labels
50 traceable real public issue records
100 Playwright Trace Doctor P95 fixtures
100/100 Playwright trace reasonable classifications
100/100 Playwright trace exact subtype matches
62 external public reference seeds
20 external public reference held-out records
20/20 external public reference reasonable classifications
20/20 external public reference actionable next actions
12 resolution validation cases
12/12 resolution statuses correct
18 applied scenario validation cases
18/18 applied scenario reasonable classifications
18/18 applied scenario valid fix plans
18/18 applied scenario verification statuses correct
Playwright collector, generic log packer, browser-use adapter, and GitHub Actions usage docs
v2.0 Auto Capture command wrapper: failure-doctor run -- <command>
Sanitize & Share command: failure-doctor sanitize <failed_run> --out <shareable_failure_pack>
Cross-framework adapter command: failure-doctor adapt <input> --framework <framework> --out <failure_pack>
100 cross-framework P95 fixtures across Selenium, Puppeteer, Cypress, Scrapy, requests, httpx, browser-use, and generic RPA
100/100 cross-framework P95 reasonable classifications
100/100 cross-framework P95 valid fix plans
0 forbidden outputs in cross-framework P95 validation
40 training challenge P95 local-only validation cases
40/40 training challenge reasonable classifications
40/40 training challenge valid fix plans
40/40 training challenge verification statuses correct
0 forbidden outputs and 0 private solution leaks in training challenge validation
160 composite P95 strict local-only validation cases
160/160 composite primary classifications correct
160/160 composite repair-order checks correct
160/160 composite evidence graphs generated
0 forbidden outputs in composite P95 strict validation
P95 Core Triad Gate: pass
3 composite showcase reports under sample_reports/composite_showcase/
10 external held-out public-source records
9/10 external held-out reasonable classifications
10/10 external held-out actionable next actions
0 forbidden outputs in generated reports/prompts
GitHub Actions green across Ubuntu, macOS, Windows, plus Windows benchmark/smoke/safety

See docs/VALIDATION_REPORT.md, docs/EXTERNAL_DATA_SOURCES.md, and validation/dashboard.md for validation metrics, limits, and boundaries.

Reproduce Validation

python -m tools.real_trace_generation.generate_real_trace_fixtures `
  --out .\examples\realistic_playwright_traces `
  --count 30 `
  --clean
python -m tools.validation.run_real_trace_validation
python -m tools.validation.run_playwright_trace_p95_validation
python -m tools.validation.run_external_public_reference_validation
python -m tools.validation.run_resolution_validation
python -m tools.validation.run_spiderbuf_inspired_validation
python -m tools.validation.run_training_challenge_validation
python -m tools.validation.run_cross_framework_p95_validation
python -m tools.validation.run_composite_diagnosis_p95_strict_validation
python -m tools.validation.run_p95_core_triage_gate
python scripts\validate_external_heldout.py

Safety Boundary

This project is for local, sanitized failure diagnosis.

It is not:

a challenge-solving tool
an access-control circumvention tool
a credential extractor
a real-platform scraper
a tool for unauthorized collection

For suspected platform risk cases, the intended output is identification, routing, and compliance-oriented next steps such as reducing request volume, using an official API, confirming authorization, contacting the platform, or stopping unauthorized collection.

Contributing Failure Cases

You do not need to write code. The most useful contribution is a sanitized failure case: log snippets, trace metadata, network summaries, screenshot metadata, and a short description of what happened.

Open an External failure case issue and remove secrets before posting:

passwords
API keys
cookies
tokens
authorization headers
private screenshots
private data
personal data

Accepted input types include sanitized error.log, trace.zip, console.txt, network.json, screenshot metadata, and user_description.txt.

If you allow it, a sanitized case may be assigned an EXT-YYYY-NNNN id, run once with the current released version before rule changes, and added to the external validation dashboard.

Templates and author-generated examples are not counted as external cases.

See CONTRIBUTING.md, docs/external_validation_protocol.md, docs/REAL_TRACE_CONTRIBUTION_GUIDE.md, and docs/REAL_DATA_SOURCES.md.

Commands

Run all tests:

python -m unittest discover -s tests -p "test_*.py"

Run smoke and safety checks:

scripts\smoke_test.ps1
scripts\local_safety_scan.ps1

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

tobylsd

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

3.2.0

Jun 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_failure_doctor-3.2.0.tar.gz (213.1 kB view details)

Uploaded Jun 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_failure_doctor-3.2.0-py3-none-any.whl (187.7 kB view details)

Uploaded Jun 29, 2026 Python 3

File details

Details for the file agent_failure_doctor-3.2.0.tar.gz.

File metadata

Download URL: agent_failure_doctor-3.2.0.tar.gz
Upload date: Jun 29, 2026
Size: 213.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_failure_doctor-3.2.0.tar.gz
Algorithm	Hash digest
SHA256	`f652126e76e95f42af4c404307613e4f821d352fad5f898ec73ea721009790b9`
MD5	`3b66418752c72ff85bd2d1fa8d67baee`
BLAKE2b-256	`34ce52c71452458fbc5536283a5443ff6f4f47133a595d5679e0c65c2dc98571`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_failure_doctor-3.2.0.tar.gz:

Publisher: publish-pypi.yml on tobybgy-lsd/web-agent-runtime-bench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agent_failure_doctor-3.2.0.tar.gz
- Subject digest: f652126e76e95f42af4c404307613e4f821d352fad5f898ec73ea721009790b9
- Sigstore transparency entry: 2010567042
- Sigstore integration time: Jun 29, 2026
Source repository:
- Permalink: tobybgy-lsd/web-agent-runtime-bench@c416d3fc8f3123b5c76f9de0f82623b632e1638c
- Branch / Tag: refs/heads/main
- Owner: https://github.com/tobybgy-lsd
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@c416d3fc8f3123b5c76f9de0f82623b632e1638c
- Trigger Event: workflow_dispatch

File details

Details for the file agent_failure_doctor-3.2.0-py3-none-any.whl.

File metadata

Download URL: agent_failure_doctor-3.2.0-py3-none-any.whl
Upload date: Jun 29, 2026
Size: 187.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_failure_doctor-3.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7075371b27cc7942267741cd37a7b877e062b1386fd25838dec097786d7854c6`
MD5	`ef1d1322c65c8b97b4820a9d80516e40`
BLAKE2b-256	`f297f58e4678fa0c67d93c3b2a8e21e260991007d931797735cb0609bf131acc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_failure_doctor-3.2.0-py3-none-any.whl:

Publisher: publish-pypi.yml on tobybgy-lsd/web-agent-runtime-bench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agent_failure_doctor-3.2.0-py3-none-any.whl
- Subject digest: 7075371b27cc7942267741cd37a7b877e062b1386fd25838dec097786d7854c6
- Sigstore transparency entry: 2010567091
- Sigstore integration time: Jun 29, 2026
Source repository:
- Permalink: tobybgy-lsd/web-agent-runtime-bench@c416d3fc8f3123b5c76f9de0f82623b632e1638c
- Branch / Tag: refs/heads/main
- Owner: https://github.com/tobybgy-lsd
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@c416d3fc8f3123b5c76f9de0f82623b632e1638c
- Trigger Event: workflow_dispatch

agent-failure-doctor 3.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Agent Failure Doctor

Quickstart

Distribution & Feedback

What You Get

One-Minute Start

Minimal Demos

Before / After Report

Verify a Fix

AI Handoff & Patch Proposal

Batch Diagnosis / Fleet Mode

P98 Controlled Maturity

Applied Scenario Demos

Spiderbuf-Inspired Challenge Demos

Integrations

Validation Status

Reproduce Validation

Safety Boundary

Contributing Failure Cases

Commands

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance