Skip to main content

Production AI pipeline monitoring — root cause detection, anomaly alerts, regression guard

Project description

Failure Forensics

Production AI pipeline monitoring — root cause detection, anomaly alerts, regression guard, and Gemini-powered recommendations.

Installation

pip install failure-forensics

Quick Start

from failure_forensics import trace

@trace(step="retrieval", version="v1")
def my_retrieval_function(query):
    # your code here
    pass

Features 🔬

Python Tests License Alerts

A self-hosted, zero-cost LLM pipeline observability tool that gives you root cause detection, anomaly alerts, A/B reporting, and a live terminal dashboard — without sending your data to any third-party service.


🆚 Why Not LangSmith or Braintrust?

Failure Forensics LangSmith Braintrust
Cost Free Paid tiers Paid tiers
Data privacy Stays on your machine Sent to cloud Sent to cloud
Customization Full control Limited Limited
Slack alerts Built-in Premium only Premium only
A/B reporting Built-in Basic Basic
Circuit breaker / trend Built-in

Failure Forensics is designed for teams who need production-grade observability without vendor lock-in.


✨ What It Does

Every pipeline run passes through a structured logging and analysis layer:

Pipeline Step  →  logger.py  →  requests.jsonl
                                     ↓
                    ┌────────────────┴────────────────┐
                    │                                 │
              forensics.py                       pattern.py
          (root cause detection)          (time series + anomaly)
                    │                                 │
              versioning.py                      baseline.py
           (v1 vs v2 comparison)            (7-day moving average)
                    │                                 │
               ab_report.py                      alerts.py
            (A/B comparison table)          (Slack / console alert)
                    └────────────────┬────────────────┘
                                     ↓
                              dashboard.py
                         (ASCII terminal dashboard)

📁 Project Structure

failure-forensics/
├── src/
│   ├── logger.py          # Logs every pipeline step to JSONL
│   ├── forensics.py       # Root cause detection (5 categories)
│   ├── pattern.py         # Time-series failure rate + anomaly detection
│   ├── baseline.py        # 7-day moving average + trend (IMPROVING/STABLE/DEGRADING)
│   ├── alerts.py          # Slack webhook + console alerts
│   ├── versioning.py      # Per-version failure rate stats
│   ├── ab_report.py       # A/B comparison report (table + JSON)
│   └── dashboard.py       # ASCII bar chart terminal dashboard
├── data/
│   └── logs/
│       └── requests.jsonl # All pipeline logs (gitignored)
├── tests/
│   └── test_forensics.py  # 8 unit tests
├── config.py              # Thresholds, Slack URL, step limits
├── main.py                # 5-scenario demo runner
├── simulate.py            # Realistic test data generator (100 runs, anomaly day)
└── requirements.txt

🚀 Getting Started

1. Clone & Install

git clone https://github.com/jasstt/failure-forensics.git
cd failure-forensics
pip install -r requirements.txt

2. (Optional) Configure Slack Alerts

Edit config.py:

SLACK_WEBHOOK_URL = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

If left empty, all alerts print to the console.

3. Run the Full Demo

python main.py

This runs 5 scenarios:

  1. Simulation — generates 100 realistic pipeline runs (2 prompt versions, anomaly day)
  2. Root cause analysis — detects the failing step and assigns a category
  3. 7-day pattern report — failure rate per day + step breakdown + anomaly check
  4. A/B reportprompt_v1 vs prompt_v2 with per-step improvement table
  5. Terminal dashboard — live ASCII bar charts, trend, top 5 failed runs

4. Run Unit Tests

python tests/test_forensics.py
python tests/test_advanced.py

🚀 Advanced Features (New in v2)

Katman Özellik Teknoloji
1 Otomatik öneri motoru Kural tabanlı
2 AI destekli hata analizi Gemini 2.5 Pro
3 Eval seti otomatik büyütme Frequency analysis
4 Prompt optimizasyon açıklaması Gemini 2.5 Pro
5 Regression guard Baseline comparison

Senaryo 6: Regression Guard Yeni bir prompt (v3) deploy edilmeden önce otomatik regresyon kontrolü yapar:

REGRESSION CHECK — v3
Baseline (v2): 11.0% failure rate
Yeni (v3):     24.5% failure rate
Delta: +13.5pp → REGRESSION_DETECTED ❌

Test Results

Katman Test Sonuç
1 — Recommender Kategori → öneri mapping ✅ PASS
2 — LLM Analyzer Gemini fallback ✅ PASS
3 — Eval Collector Duplicate prevention ✅ PASS
4 — Prompt Optimizer A/B açıklama (v2: +10pp) ✅ PASS
5 — Regression Guard DETECTED + PASS senaryoları ✅ PASS

Key Results

  • A/B: prompt_v2, v1'e göre 10pp iyileşme
  • Regression Guard: v3 deploy'u +6pp delta ile WARNING olarak engelledi
  • Eval Collector: 5 yeni eval adayı otomatik toplandı
  • LLM Analyzer: Gemini kapalıyken kural tabanlına sorunsuz fallback

📊 Results

Feature Result
Unit Tests 8/8 PASS
Root cause categories 5 types (RETRIEVAL_QUALITY, RERANKER_FAILURE, LLM_HALLUCINATION, CITATION_MISS, API_ERROR)
Anomaly detection 20% delta threshold — flags when today's rate exceeds 7-day average by >20pp
A/B comparison v2: 11.5pp improvement over v1 (22.5% → 11.0% failure rate)
Trend analysis IMPROVING / STABLE / DEGRADING based on 7-day moving average
Slack integration Webhook ready — fires on rate threshold, anomaly, or 3 consecutive failures

⚙️ Configuration (config.py)

Parameter Default Description
FAILURE_RATE_THRESHOLD 0.25 Alert fires above this failure rate
ANOMALY_THRESHOLD 0.20 Flag if today exceeds 7-day avg by this delta
SLACK_WEBHOOK_URL "" Empty = console output
CONSECUTIVE_FAILURE_THRESHOLD 3 Alert after N consecutive step failures
STEP_THRESHOLDS see config Per-step max acceptable failure rate

🧪 Root Cause Categories

Category Trigger
RETRIEVAL_QUALITY Retrieval step fails — no results, low score
RERANKER_FAILURE Reranker can't parse LLM response or times out
LLM_HALLUCINATION Generation returns empty or uncited response
CITATION_MISS Answer produced but no source citations found
API_ERROR Timeout, 429 rate limit, 503 service unavailable

📈 Terminal Dashboard (Sample Output)

═════════════════════════════════════════════════════════════
  🔬  FAILURE FORENSICS — Terminal Dashboard
═════════════════════════════════════════════════════════════

  📅 SON 7 GÜNÜN FAILURE RATE GRAFİĞİ
  2026-06-03  [███░░░░░░░░░░░░░░░░░░░░░░░░░░░] 13.0%
  2026-06-07  [████████░░░░░░░░░░░░░░░░░░░░░░] 27.3% ⚠️
  2026-06-10  [███░░░░░░░░░░░░░░░░░░░░░░░░░░░] 12.0%

  🔍 ADIM BAZINDA HATA DAĞILIMI
  retrieval     [███████░░░░░░░░░░░░░] 38.0%  (38/100 hatalı)
  reranking     [██░░░░░░░░░░░░░░░░░░] 13.0%  (13/100 hatalı)
  generation    [██░░░░░░░░░░░░░░░░░░] 10.0%  (10/100 hatalı)
  citation      [█░░░░░░░░░░░░░░░░░░░]  6.0%  (6/100 hatalı)

  ⚡ ANOMALİ: ✅ Normal: Bugün (12.0%) ≈ 7g ort. (16.2%)
  📊 TREND: ➡️  STABLE — Hareketli Ort: 16.0%

🛠 Technologies Used

  • Python standard libraryjson, collections, datetime, threading
  • requests — Slack webhook HTTP calls
  • python-dotenv — Environment variable management

No heavy dependencies. No cloud. No API keys required.


🔭 Roadmap

  • FastAPI REST endpoint for remote log ingestion
  • HTML report export
  • PostgreSQL backend for large-scale log storage
  • Multi-pipeline support (compare RAG vs fine-tuned model)
  • Email alerts as alternative to Slack

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

failure_forensics-0.1.1.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

failure_forensics-0.1.1-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file failure_forensics-0.1.1.tar.gz.

File metadata

  • Download URL: failure_forensics-0.1.1.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for failure_forensics-0.1.1.tar.gz
Algorithm Hash digest
SHA256 818e96923dd14e7a137f700c255318aca1ef167bf256b3e5d2a3b69134c4609b
MD5 0b4f973a5f93f1e6645076d2f257a3a8
BLAKE2b-256 af70a44b631a0d665f43de1037fc8ed474114d837ca8bdb4cfd727a3c5e42741

See more details on using hashes here.

File details

Details for the file failure_forensics-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for failure_forensics-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b0cbf3854252b2e9979a53253e86d04e24a766a7eda943500ea32253c3f11bd7
MD5 34626445df9d8a6c4705810eac6a6bf1
BLAKE2b-256 9351a55a5c00ededdf4610d7ef3c1d2dabc900d71ed638ad6ab332517f809766

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page