Production AI pipeline monitoring — root cause detection, anomaly alerts, regression guard

Project description

Failure Forensics

Production AI pipeline monitoring — root cause detection, anomaly alerts, regression guard, and Gemini-powered recommendations.

Installation

pip install failure-forensics

Quick Start

from failure_forensics import trace

@trace(step="retrieval", version="v1")
def my_retrieval_function(query):
    # your code here
    pass

Features 🔬

Python Tests License Alerts

A self-hosted, zero-cost LLM pipeline observability tool that gives you root cause detection, anomaly alerts, A/B reporting, and a live terminal dashboard — without sending your data to any third-party service.

🆚 Why Not LangSmith or Braintrust?

	Failure Forensics	LangSmith	Braintrust
Cost	Free	Paid tiers	Paid tiers
Data privacy	Stays on your machine	Sent to cloud	Sent to cloud
Customization	Full control	Limited	Limited
Slack alerts	Built-in	Premium only	Premium only
A/B reporting	Built-in	Basic	Basic
Circuit breaker / trend	Built-in	❌	❌

Failure Forensics is designed for teams who need production-grade observability without vendor lock-in.

✨ What It Does

Every pipeline run passes through a structured logging and analysis layer:

Pipeline Step  →  logger.py  →  requests.jsonl
                                     ↓
                    ┌────────────────┴────────────────┐
                    │                                 │
              forensics.py                       pattern.py
          (root cause detection)          (time series + anomaly)
                    │                                 │
              versioning.py                      baseline.py
           (v1 vs v2 comparison)            (7-day moving average)
                    │                                 │
               ab_report.py                      alerts.py
            (A/B comparison table)          (Slack / console alert)
                    └────────────────┬────────────────┘
                                     ↓
                              dashboard.py
                         (ASCII terminal dashboard)

📁 Project Structure

failure-forensics/
├── src/
│   ├── logger.py          # Logs every pipeline step to JSONL
│   ├── forensics.py       # Root cause detection (5 categories)
│   ├── pattern.py         # Time-series failure rate + anomaly detection
│   ├── baseline.py        # 7-day moving average + trend (IMPROVING/STABLE/DEGRADING)
│   ├── alerts.py          # Slack webhook + console alerts
│   ├── versioning.py      # Per-version failure rate stats
│   ├── ab_report.py       # A/B comparison report (table + JSON)
│   └── dashboard.py       # ASCII bar chart terminal dashboard
├── data/
│   └── logs/
│       └── requests.jsonl # All pipeline logs (gitignored)
├── tests/
│   └── test_forensics.py  # 8 unit tests
├── config.py              # Thresholds, Slack URL, step limits
├── main.py                # 5-scenario demo runner
├── simulate.py            # Realistic test data generator (100 runs, anomaly day)
└── requirements.txt

🚀 Getting Started

1. Clone & Install

git clone https://github.com/jasstt/failure-forensics.git
cd failure-forensics
pip install -r requirements.txt

2. (Optional) Configure Slack Alerts

Edit config.py:

SLACK_WEBHOOK_URL = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

If left empty, all alerts print to the console.

3. Run the Full Demo

python main.py

This runs 5 scenarios:

Simulation — generates 100 realistic pipeline runs (2 prompt versions, anomaly day)
Root cause analysis — detects the failing step and assigns a category
7-day pattern report — failure rate per day + step breakdown + anomaly check
A/B report — prompt_v1 vs prompt_v2 with per-step improvement table
Terminal dashboard — live ASCII bar charts, trend, top 5 failed runs

4. Run Unit Tests

python tests/test_forensics.py
python tests/test_advanced.py

🚀 Advanced Features (New in v2)

Katman	Özellik	Teknoloji
1	Otomatik öneri motoru	Kural tabanlı
2	AI destekli hata analizi	Gemini 2.5 Pro
3	Eval seti otomatik büyütme	Frequency analysis
4	Prompt optimizasyon açıklaması	Gemini 2.5 Pro
5	Regression guard	Baseline comparison

Senaryo 6: Regression Guard Yeni bir prompt (v3) deploy edilmeden önce otomatik regresyon kontrolü yapar:

REGRESSION CHECK — v3
Baseline (v2): 11.0% failure rate
Yeni (v3):     24.5% failure rate
Delta: +13.5pp → REGRESSION_DETECTED ❌

Test Results

Katman	Test	Sonuç
1 — Recommender	Kategori → öneri mapping	✅ PASS
2 — LLM Analyzer	Gemini fallback	✅ PASS
3 — Eval Collector	Duplicate prevention	✅ PASS
4 — Prompt Optimizer	A/B açıklama (v2: +10pp)	✅ PASS
5 — Regression Guard	DETECTED + PASS senaryoları	✅ PASS

Key Results

A/B: prompt_v2, v1'e göre 10pp iyileşme
Regression Guard: v3 deploy'u +6pp delta ile WARNING olarak engelledi
Eval Collector: 5 yeni eval adayı otomatik toplandı
LLM Analyzer: Gemini kapalıyken kural tabanlına sorunsuz fallback

📊 Results

Feature	Result
Unit Tests	8/8 PASS ✅
Root cause categories	5 types (RETRIEVAL_QUALITY, RERANKER_FAILURE, LLM_HALLUCINATION, CITATION_MISS, API_ERROR)
Anomaly detection	20% delta threshold — flags when today's rate exceeds 7-day average by >20pp
A/B comparison	v2: 11.5pp improvement over v1 (22.5% → 11.0% failure rate)
Trend analysis	IMPROVING / STABLE / DEGRADING based on 7-day moving average
Slack integration	Webhook ready — fires on rate threshold, anomaly, or 3 consecutive failures

⚙️ Configuration (`config.py`)

Parameter	Default	Description
`FAILURE_RATE_THRESHOLD`	`0.25`	Alert fires above this failure rate
`ANOMALY_THRESHOLD`	`0.20`	Flag if today exceeds 7-day avg by this delta
`SLACK_WEBHOOK_URL`	`""`	Empty = console output
`CONSECUTIVE_FAILURE_THRESHOLD`	`3`	Alert after N consecutive step failures
`STEP_THRESHOLDS`	see config	Per-step max acceptable failure rate

🧪 Root Cause Categories

Category	Trigger
`RETRIEVAL_QUALITY`	Retrieval step fails — no results, low score
`RERANKER_FAILURE`	Reranker can't parse LLM response or times out
`LLM_HALLUCINATION`	Generation returns empty or uncited response
`CITATION_MISS`	Answer produced but no source citations found
`API_ERROR`	Timeout, 429 rate limit, 503 service unavailable

📈 Terminal Dashboard (Sample Output)

═════════════════════════════════════════════════════════════
  🔬  FAILURE FORENSICS — Terminal Dashboard
═════════════════════════════════════════════════════════════

  📅 SON 7 GÜNÜN FAILURE RATE GRAFİĞİ
  2026-06-03  [███░░░░░░░░░░░░░░░░░░░░░░░░░░░] 13.0%
  2026-06-07  [████████░░░░░░░░░░░░░░░░░░░░░░] 27.3% ⚠️
  2026-06-10  [███░░░░░░░░░░░░░░░░░░░░░░░░░░░] 12.0%

  🔍 ADIM BAZINDA HATA DAĞILIMI
  retrieval     [███████░░░░░░░░░░░░░] 38.0%  (38/100 hatalı)
  reranking     [██░░░░░░░░░░░░░░░░░░] 13.0%  (13/100 hatalı)
  generation    [██░░░░░░░░░░░░░░░░░░] 10.0%  (10/100 hatalı)
  citation      [█░░░░░░░░░░░░░░░░░░░]  6.0%  (6/100 hatalı)

  ⚡ ANOMALİ: ✅ Normal: Bugün (12.0%) ≈ 7g ort. (16.2%)
  📊 TREND: ➡️  STABLE — Hareketli Ort: 16.0%

🛠 Technologies Used

Python standard library — json, collections, datetime, threading
requests — Slack webhook HTTP calls
python-dotenv — Environment variable management

No heavy dependencies. No cloud. No API keys required.

🔭 Roadmap

FastAPI REST endpoint for remote log ingestion
HTML report export
PostgreSQL backend for large-scale log storage
Multi-pipeline support (compare RAG vs fine-tuned model)
Email alerts as alternative to Slack

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

failure_forensics-0.1.1.tar.gz (29.8 kB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

failure_forensics-0.1.1-py3-none-any.whl (28.4 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file failure_forensics-0.1.1.tar.gz.

File metadata

Download URL: failure_forensics-0.1.1.tar.gz
Upload date: Jun 10, 2026
Size: 29.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for failure_forensics-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`818e96923dd14e7a137f700c255318aca1ef167bf256b3e5d2a3b69134c4609b`
MD5	`0b4f973a5f93f1e6645076d2f257a3a8`
BLAKE2b-256	`af70a44b631a0d665f43de1037fc8ed474114d837ca8bdb4cfd727a3c5e42741`

See more details on using hashes here.

File details

Details for the file failure_forensics-0.1.1-py3-none-any.whl.

File metadata

Download URL: failure_forensics-0.1.1-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 28.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for failure_forensics-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b0cbf3854252b2e9979a53253e86d04e24a766a7eda943500ea32253c3f11bd7`
MD5	`34626445df9d8a6c4705810eac6a6bf1`
BLAKE2b-256	`9351a55a5c00ededdf4610d7ef3c1d2dabc900d71ed638ad6ab332517f809766`

See more details on using hashes here.

failure-forensics 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Failure Forensics

Installation

Quick Start

Features 🔬

🆚 Why Not LangSmith or Braintrust?

✨ What It Does

📁 Project Structure

🚀 Getting Started

1. Clone & Install

2. (Optional) Configure Slack Alerts

3. Run the Full Demo

4. Run Unit Tests

🚀 Advanced Features (New in v2)

Test Results

Key Results

📊 Results

⚙️ Configuration (config.py)

🧪 Root Cause Categories

📈 Terminal Dashboard (Sample Output)

🛠 Technologies Used

🔭 Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

⚙️ Configuration (`config.py`)