Skip to main content

Monitor, diagnose, and auto-correct LLM failures — with XGBoost failure classification, question-type routing, auto-calibrating thresholds, Wikidata/Serper ground truth, and production analytics

Project description

Failure Intelligence Engine (FIE)

Real-time LLM failure detection, diagnosis, and automatic correction.

FIE sits between your LLM and your users. When the model gives a wrong answer, FIE catches it, finds the correct answer from a trusted source, and returns the correction — before the user ever sees the mistake.

Python FastAPI MongoDB PyPI License


Quickstart — Use the SDK

pip install fie-sdk
from fie import monitor

@monitor(
    fie_url="https://failure-intelligence-system-800748790940.asia-south1.run.app",
    api_key="your-api-key",
    mode="correct",   # or "monitor"
)
def ask_ai(prompt: str) -> str:
    return your_llm_call(prompt)

response = ask_ai("Who invented the telephone?")
# Returns corrected answer if LLM was wrong, original answer if correct

SDK Modes

Mode Behavior
local No server needed — rule-based heuristics run on your machine instantly
monitor Non-blocking — FIE checks in background, original answer returned immediately
correct Synchronous — FIE verifies and returns corrected answer if failure detected
# Try it instantly with no server or API key
@monitor(mode="local")
def ask_ai(prompt: str) -> str:
    return your_llm(prompt)

Get an API Key

  1. Sign in at https://failure-intelligence-system.pages.dev
  2. Your API key is shown in the dashboard after login

How It Works

Your LLM answer → FIE
                   ├── Shadow ensemble (3 independent models cross-check)
                   ├── Failure Signal Vector (agreement, entropy, outlier detection)
                   ├── Diagnostic Jury (3 agents vote on root cause)
                   ├── Ground Truth Pipeline (Wikidata → Google Search → consensus)
                   └── Fix Engine (returns corrected answer or escalates)

Classifier: XGBoost v3 (AUC 0.728) backed by a 5-type question router. Factual questions go through full external verification; code/opinion questions skip it to avoid false positives.


Self-Hosting

Requirements

  • Python 3.11+
  • MongoDB Atlas (free tier works)
  • Groq API key — free at console.groq.com
  • Node.js 18+ (dashboard only)

1. Clone & Install

git clone https://github.com/AyushSingh110/Failure_Intelligence_System.git
cd Failure_Intelligence_System
python -m venv .venv
source .venv/bin/activate        # macOS/Linux
# .venv\Scripts\activate         # Windows
pip install -r requirements.txt

2. Environment Variables

Create .env in the project root:

MONGODB_URI=mongodb+srv://user:pass@cluster.mongodb.net/?retryWrites=true&w=majority
MONGODB_DB_NAME=fie_database

GROQ_API_KEY=gsk_your_groq_key
GROQ_ENABLED=true
GROQ_MODELS=["llama-3.3-70b-versatile","deepseek-r1-distill-llama-70b","qwen-qwq-32b"]

SERPER_API_KEY=your_serper_key     # optional — needed for temporal questions
SERPER_ENABLED=true

OLLAMA_ENABLED=false

GOOGLE_CLIENT_ID=your-google-oauth-client-id.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=your-google-oauth-client-secret
GOOGLE_REDIRECT_URI=http://localhost:5173

JWT_SECRET_KEY=replace-with-a-long-random-secret-minimum-32-chars
JWT_ALGORITHM=HS256
JWT_EXPIRE_HOURS=24
ADMIN_EMAIL=your@email.com

3. Start Server

uvicorn app.main:app --reload
# Backend: http://localhost:8000
# API docs: http://localhost:8000/docs

4. Dashboard (optional)

cd Frontend
npm install
npm run dev
# Dashboard: http://localhost:5173

API Endpoints

Method Path Description
POST /api/v1/monitor Main endpoint — full detection + correction pipeline
POST /api/v1/diagnose Run diagnostic jury only
POST /api/v1/analyze Signal extraction only (no jury, no GT)
POST /api/v1/feedback/{id} Submit human feedback on an inference
GET /api/v1/monitor/model-info Active model version, thresholds, AUC
GET /api/v1/analytics/usage Request volume, failure rate, daily breakdown
GET /api/v1/analytics/model-performance XGBoost accuracy, per-question-type stats
GET /api/v1/analytics/calibration Confidence calibration curves + ECE score
GET /api/v1/analytics/question-breakdown Failure/fix/escalation rate per question type
GET /api/v1/analytics/paper-metrics All benchmark metrics in one call
GET /api/v1/analytics/sdk-telemetry Usage data from opted-in SDK users
GET /health Health check

Example Request

curl -X POST http://localhost:8000/api/v1/monitor \
  -H "Content-Type: application/json" \
  -H "X-API-Key: fie-your-key" \
  -d '{
    "prompt": "Who invented the telephone?",
    "primary_output": "Thomas Edison invented the telephone.",
    "primary_model_name": "gpt-4",
    "run_full_jury": true
  }'

Running Tests

# Offline unit tests — no server, no API key needed (28 tests)
pytest tests/test_core.py -v

# Covers: question classifier, XGBoost fallback, per-type thresholds,
#         SDK local predictor, entropy detector, SDK config

Opt-In Telemetry (SDK Users)

To share anonymized usage data (no prompts, no API keys):

FIE_TELEMETRY=true python your_app.py

This sends: SDK version, question type, failure detection rate, mode. Nothing else.


Benchmark Results

Evaluated on TruthfulQA (817 adversarial questions).

Method Recall FPR F1 AUC-ROC
POET rule-based (baseline) 56.4% 38.7% 58.7%
XGBoost v2 71.6% 53.9% 63.5% 0.728

Required Services

Service Required Free Tier
Groq Yes 14,400 req/day
MongoDB Atlas Yes 512 MB
Wikidata Yes No key needed
Serper.dev Optional 2,500 searches/month

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fie_sdk-1.2.0.tar.gz (36.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fie_sdk-1.2.0-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file fie_sdk-1.2.0.tar.gz.

File metadata

  • Download URL: fie_sdk-1.2.0.tar.gz
  • Upload date:
  • Size: 36.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for fie_sdk-1.2.0.tar.gz
Algorithm Hash digest
SHA256 a65c8a7198c0d00f42bb6c7a69df695d8adab53436d84b560a704ef52c41316c
MD5 681d9264e043526f4916a81539303fa4
BLAKE2b-256 7ec9a0e8ceedd0a9a74f90ec36f7c61147371a444fb3d54d261f489ff098c3a3

See more details on using hashes here.

File details

Details for the file fie_sdk-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: fie_sdk-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for fie_sdk-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ce8ae22b420ac87d8de403dc32009869b1129572bf2080470852644433808197
MD5 a72f156ff538a046c446a10ff220e47a
BLAKE2b-256 f9a6b4ff341640832d70b1f4ec10460eb4a0c672acff45f6bf44e3b1a817bd38

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page