Detect the 7 ways AI chatbots silently fail customers

These details have not been verified by PyPI

Project description

chatbot-auditor

Detect the 7 ways AI chatbots silently fail customers — from existing conversation logs, using tools your ML team doesn't need to install.

Your chatbot dashboard says "95% of conversations resolved." The real number is usually 60–70%. The gap is customers who gave up, got stuck in loops, asked for a human and were refused, or silently walked away. Your chatbot records all of them as "resolved."

chatbot-auditor reads your conversation logs and tells you the truth.

Why this exists
The 7 failure modes
Quick start
Features
Architecture
Where this fits
Installation
Documentation
Contributing
License

Why this exists

75% of consumers are frustrated by AI customer support
56% of unhappy customers leave without complaining
88% will not return after a negative chatbot interaction
Chatbot platforms (Intercom, Zendesk, Drift) grade their own homework — their dashboards are designed to make the chatbot look good
Air Canada was sued because its chatbot promised a non-existent refund policy
DPD's chatbot swore at a customer and wrote a poem about how terrible the company was
Klarna reversed course after firing 700 agents when customer satisfaction tanked

Companies don't need a better chatbot. They need to know when their current chatbot is failing. That's what this library does.

The 7 failure modes

#	Mode	What it catches	Detector
1	Death Loop	Bot gives the same answer 3+ times	`DeathLoopDetector`
2	Silent Churn	Customer left without saying anything	`SilentChurnDetector`
3	Escalation Burial	Bot refused to transfer to a human	`EscalationBurialDetector`
4	Sentiment Collapse	Customer got frustrated, bot didn't notice	`SentimentCollapseDetector`
5	Confident Lies	Bot promised something outside policy	`ConfidentLiesDetector`
6	Brand Damage	Bot said something embarrassing	`BrandDamageDetector`
7	Confident Misinformation	Bot stated wrong facts	`ConfidentMisinformationDetector`

Deep dive on each →

Quick start

pip install chatbot-auditor

from chatbot_auditor import audit, Conversation, Message, Role

conv = Conversation(
    id="demo",
    messages=[
        Message(role=Role.USER, content="I need a refund for my order"),
        Message(role=Role.BOT, content="Please check our FAQ at example.com/faq."),
        Message(role=Role.USER, content="I already did. Can someone process it?"),
        Message(role=Role.BOT, content="Please check our FAQ at example.com/faq."),
        Message(role=Role.USER, content="this is useless"),
        Message(role=Role.BOT, content="Please check our FAQ at example.com/faq."),
    ],
)

for d in audit([conv]):
    print(f"[{d.severity.value.upper()}] {d.detector}: {d.explanation}")

Output:

[MEDIUM] death_loop: Bot gave 3 consecutive similar responses ...
[HIGH]   silent_churn: Conversation of 6 messages ended without ...

Or from the command line:

chatbot-audit analyze conversations.json --format markdown --output audit.md

Features

Category	What you get
7 detectors	Full framework for chatbot-specific failures. Works out of the box.
4 adapters	JSON, CSV, Intercom, Zendesk. Point at any source, same pipeline.
Zero-dep defaults	Stdlib-only detection for the core five detectors. No API keys required.
Pluggable backends	Optional sentence-transformers semantic similarity, LLM moderation, etc.
Report generator	Markdown and self-contained HTML. Email-safe, Slack-compatible, XSS-escaped.
FastAPI server	Drop-in HTTP service with bearer-token auth, Docker-ready.
CLI	`analyze`, `analyze-intercom`, `analyze-zendesk` with `--format` and `--output`.
Knowledge bases	Optional `PolicyBase` / `FactBase` to cross-check bot claims.
Full type safety	`mypy --strict` passes on every public symbol.
Benchmarked	Precision / recall / F1 on a synthetic corpus. Reproducible.

Architecture

Detectors are pure functions over Conversation objects. Adapters feed conversations in; reporters format detections out. Every layer is independently swappable.

flowchart LR
    subgraph Sources
        A[Intercom API]
        B[Zendesk API]
        C[JSON / JSONL files]
        D[CSV / TSV files]
        E[Your custom source]
    end

    A --> F
    B --> F
    C --> F
    D --> F
    E --> F

    subgraph Pipeline
        F[Adapter.fetch] --> G[Conversation]
        G --> H[DetectorRegistry.run]
        H --> I[Detection]
    end

    subgraph Outputs
        I --> J[Python API]
        I --> K[JSON]
        I --> L[Markdown]
        I --> M[HTML]
        I --> N[REST API]
    end

    classDef src fill:#e0f2fe,stroke:#0284c7
    classDef pipe fill:#fef3c7,stroke:#d97706
    classDef out fill:#dcfce7,stroke:#16a34a
    class A,B,C,D,E src
    class F,G,H,I pipe
    class J,K,L,M,N out

Full system design in ARCHITECTURE.md.

Where this fits

The AI observability and conversational-quality space is crowded, and rightly so — there are several strong tools depending on who you are and what you're measuring. chatbot-auditor is deliberately narrow in scope:

It's built for CX and support leaders, not ML engineers. The output uses language the VP of Support already thinks in — "silent churn", "escalation burial", "confident lies" — rather than ML metrics like "faithfulness" or "perplexity". That framing is the whole point.
It operates on existing conversation logs, not during the LLM call. You don't need to instrument your chatbot, change the runtime, or add an SDK to your production path. You point the library at logs you already have and it tells you what's wrong.
The 7-mode framework is the contribution, not the code. The detection algorithms are mostly simple statistics. What makes this project different is that each failure mode is named, defined, and detectable with clear false-positive characteristics — a vocabulary teams can use to discuss chatbot quality the way DevOps teams discuss SLIs and SLOs.
It runs zero-cost by default. The five core detectors need no API keys, no model downloads, no cloud accounts — Python's stdlib is enough. Richer backends (embeddings, LLM moderation) are opt-in when the defaults aren't enough.
It stops at detection. There's no chatbot to replace, no agent to train, no dashboard SaaS to sign up for. Detections leave as JSON / Markdown / HTML / HTTP responses, and what your team does with them is up to you.

If you're evaluating LLMs during invocation, tracing prompts, or running model-level evals, the LLM observability tools in the ecosystem will serve you better. If you're coaching human agents or running call-center QA, the voice-QA category is a different product. This library is specifically for "my AI chatbot is in production handling real customers — what's it actually doing, and where is it quietly failing?"

Installation

# Minimum — just the core library and CLI
pip install chatbot-auditor

# With API adapters
pip install "chatbot-auditor[intercom,zendesk]"

# With semantic similarity for paraphrased loop detection
pip install "chatbot-auditor[llm]"

# With the HTTP server
pip install "chatbot-auditor[server]"

# Everything
pip install "chatbot-auditor[intercom,zendesk,llm,server]"

Verify:

chatbot-audit version

Documentation

Full docs site — tutorials, reference, guides
Getting started — run your first audit in under a minute
The 7 failure modes — what each detector catches and why
Audit Intercom data — end-to-end with real data
Write a custom detector — add your own failure modes
Self-host the HTTP server — Docker, auth, deployment
LLM & embedding backends — swap in richer scorers
API reference — auto-generated from docstrings
Architecture — system design, extension points, design decisions

Examples

Runnable scripts in examples/:

01_audit_json_file.py — smallest end-to-end audit
02_audit_csv_file.py — CSV export from any platform
03_custom_detector.py — write your own failure mode
04_knowledge_bases.py — PolicyBase + FactBase
05_embeddings_backend.py — semantic similarity for high-paraphrase loops

Contributing

Issues and pull requests welcome. See CONTRIBUTING.md for dev setup, coding standards, and how to add a detector.

Report a bug → GitHub issues
Propose a feature → same, pick the feature-request template
Report a security issue → see SECURITY.md — do not open a public issue
Get help / ask a question → SUPPORT.md

Development

git clone https://github.com/HemantBK/chatbot-auditor.git
cd chatbot-auditor
uv sync --all-extras
uv run pytest          # 203 tests
uv run mypy            # strict type check
uv run ruff check .    # lint
uv run ruff format .   # format
uv run mkdocs serve    # live docs

License

Apache License 2.0 — see LICENSE and NOTICE.

Original author: BK. If you use, fork, or build on this project, preserve the NOTICE file as required by the license. The copyright header Copyright 2026 BK must remain intact in every derivative source file.

Acknowledgments

Framework and failure-mode research grounded in public incident reports: Air Canada (2024), DPD (2024), Klarna (2025), Cursor (2024), and industry analyses from Qualtrics, Gartner, and others cited in the failure modes docs.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Apr 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chatbot_auditor-0.1.0.tar.gz (74.3 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chatbot_auditor-0.1.0-py3-none-any.whl (68.6 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file chatbot_auditor-0.1.0.tar.gz.

File metadata

Download URL: chatbot_auditor-0.1.0.tar.gz
Upload date: Apr 17, 2026
Size: 74.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for chatbot_auditor-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`46a3b5a63ddebacb4e45ec8b45eaf6c0f5300a3bc757ae65b68986c1bafb97ed`
MD5	`88d23f8d5803c74beca62e5ec67be2e0`
BLAKE2b-256	`b8cd5349ba7d4e995b968eef61a2e8aef47fd94bfd9bd7856dd6c8e59ff30254`

See more details on using hashes here.

Provenance

The following attestation bundles were made for chatbot_auditor-0.1.0.tar.gz:

Publisher: release.yml on HemantBK/chatbot-auditor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: chatbot_auditor-0.1.0.tar.gz
- Subject digest: 46a3b5a63ddebacb4e45ec8b45eaf6c0f5300a3bc757ae65b68986c1bafb97ed
- Sigstore transparency entry: 1329282410
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: HemantBK/chatbot-auditor@e030535bb4d7c13a94a1e349584d68bb7f656a99
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/HemantBK
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@e030535bb4d7c13a94a1e349584d68bb7f656a99
- Trigger Event: push

File details

Details for the file chatbot_auditor-0.1.0-py3-none-any.whl.

File metadata

Download URL: chatbot_auditor-0.1.0-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 68.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for chatbot_auditor-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bdba8dae5d73c5e1f1d4505434091c3f641ae859655faf0a69c23e9230043a1f`
MD5	`18aa45d793e85b0fc87455e77917d6a6`
BLAKE2b-256	`fe82fac4cb95b1abf993a752181a5f1626ca8e06a10c4fc6aa748fda16626d7a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for chatbot_auditor-0.1.0-py3-none-any.whl:

Publisher: release.yml on HemantBK/chatbot-auditor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: chatbot_auditor-0.1.0-py3-none-any.whl
- Subject digest: bdba8dae5d73c5e1f1d4505434091c3f641ae859655faf0a69c23e9230043a1f
- Sigstore transparency entry: 1329282436
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: HemantBK/chatbot-auditor@e030535bb4d7c13a94a1e349584d68bb7f656a99
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/HemantBK
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@e030535bb4d7c13a94a1e349584d68bb7f656a99
- Trigger Event: push

chatbot-auditor 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

chatbot-auditor

Table of contents

Why this exists

The 7 failure modes

Quick start

Features

Architecture

Where this fits

Installation

Documentation

Examples

Contributing

Development

License

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance