Monitor, diagnose, and auto-correct LLM failures — with XGBoost failure classification, question-type routing, auto-calibrating thresholds, Wikidata/Serper ground truth, and production analytics
Project description
Failure Intelligence Engine (FIE)
Real-time LLM failure detection, diagnosis, and automatic correction.
FIE sits between your LLM and your users. When the model gives a wrong answer, FIE catches it, finds the correct answer from a trusted source, and returns the correction — before the user ever sees the mistake.
Quickstart — Use the SDK
pip install fie-sdk
from fie import monitor
@monitor(
fie_url="https://failure-intelligence-system-800748790940.asia-south1.run.app",
api_key="your-api-key",
mode="correct", # or "monitor"
)
def ask_ai(prompt: str) -> str:
return your_llm_call(prompt)
response = ask_ai("Who invented the telephone?")
# Returns corrected answer if LLM was wrong, original answer if correct
SDK Modes
| Mode | Behavior |
|---|---|
local |
No server needed — rule-based heuristics run on your machine instantly |
monitor |
Non-blocking — FIE checks in background, original answer returned immediately |
correct |
Synchronous — FIE verifies and returns corrected answer if failure detected |
# Try it instantly with no server or API key
@monitor(mode="local")
def ask_ai(prompt: str) -> str:
return your_llm(prompt)
Get an API Key
- Sign in at https://failure-intelligence-system.pages.dev
- Your API key is shown in the dashboard after login
How It Works
Your LLM answer → FIE
├── Shadow ensemble (3 independent models cross-check)
├── Failure Signal Vector (agreement, entropy, outlier detection)
├── Diagnostic Jury (3 agents vote on root cause)
├── Ground Truth Pipeline (Wikidata → Google Search → consensus)
└── Fix Engine (returns corrected answer or escalates)
Classifier: XGBoost v3 (AUC 0.728) backed by a 5-type question router. Factual questions go through full external verification; code/opinion questions skip it to avoid false positives.
Self-Hosting
Requirements
- Python 3.11+
- MongoDB Atlas (free tier works)
- Groq API key — free at console.groq.com
- Node.js 18+ (dashboard only)
1. Clone & Install
git clone https://github.com/AyushSingh110/Failure_Intelligence_System.git
cd Failure_Intelligence_System
python -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows
pip install -r requirements.txt
2. Environment Variables
Create .env in the project root:
MONGODB_URI=mongodb+srv://user:pass@cluster.mongodb.net/?retryWrites=true&w=majority
MONGODB_DB_NAME=fie_database
GROQ_API_KEY=gsk_your_groq_key
GROQ_ENABLED=true
GROQ_MODELS=["llama-3.3-70b-versatile","deepseek-r1-distill-llama-70b","qwen-qwq-32b"]
SERPER_API_KEY=your_serper_key # optional — needed for temporal questions
SERPER_ENABLED=true
OLLAMA_ENABLED=false
GOOGLE_CLIENT_ID=your-google-oauth-client-id.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=your-google-oauth-client-secret
GOOGLE_REDIRECT_URI=http://localhost:5173
JWT_SECRET_KEY=replace-with-a-long-random-secret-minimum-32-chars
JWT_ALGORITHM=HS256
JWT_EXPIRE_HOURS=24
ADMIN_EMAIL=your@email.com
3. Start Server
uvicorn app.main:app --reload
# Backend: http://localhost:8000
# API docs: http://localhost:8000/docs
4. Dashboard (optional)
cd Frontend
npm install
npm run dev
# Dashboard: http://localhost:5173
API Endpoints
| Method | Path | Description |
|---|---|---|
POST |
/api/v1/monitor |
Main endpoint — full detection + correction pipeline |
POST |
/api/v1/diagnose |
Run diagnostic jury only |
POST |
/api/v1/analyze |
Signal extraction only (no jury, no GT) |
POST |
/api/v1/feedback/{id} |
Submit human feedback on an inference |
GET |
/api/v1/monitor/model-info |
Active model version, thresholds, AUC |
GET |
/api/v1/analytics/usage |
Request volume, failure rate, daily breakdown |
GET |
/api/v1/analytics/model-performance |
XGBoost accuracy, per-question-type stats |
GET |
/api/v1/analytics/calibration |
Confidence calibration curves + ECE score |
GET |
/api/v1/analytics/question-breakdown |
Failure/fix/escalation rate per question type |
GET |
/api/v1/analytics/paper-metrics |
All benchmark metrics in one call |
GET |
/api/v1/analytics/sdk-telemetry |
Usage data from opted-in SDK users |
GET |
/health |
Health check |
Example Request
curl -X POST http://localhost:8000/api/v1/monitor \
-H "Content-Type: application/json" \
-H "X-API-Key: fie-your-key" \
-d '{
"prompt": "Who invented the telephone?",
"primary_output": "Thomas Edison invented the telephone.",
"primary_model_name": "gpt-4",
"run_full_jury": true
}'
Running Tests
# Offline unit tests — no server, no API key needed (28 tests)
pytest tests/test_core.py -v
# Covers: question classifier, XGBoost fallback, per-type thresholds,
# SDK local predictor, entropy detector, SDK config
Opt-In Telemetry (SDK Users)
To share anonymized usage data (no prompts, no API keys):
FIE_TELEMETRY=true python your_app.py
This sends: SDK version, question type, failure detection rate, mode. Nothing else.
Benchmark Results
Evaluated on TruthfulQA (817 adversarial questions).
| Method | Recall | FPR | F1 | AUC-ROC |
|---|---|---|---|---|
| POET rule-based (baseline) | 56.4% | 38.7% | 58.7% | — |
| XGBoost v2 | 71.6% | 53.9% | 63.5% | 0.728 |
Required Services
| Service | Required | Free Tier |
|---|---|---|
| Groq | Yes | 14,400 req/day |
| MongoDB Atlas | Yes | 512 MB |
| Wikidata | Yes | No key needed |
| Serper.dev | Optional | 2,500 searches/month |
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fie_sdk-1.2.0.tar.gz.
File metadata
- Download URL: fie_sdk-1.2.0.tar.gz
- Upload date:
- Size: 36.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a65c8a7198c0d00f42bb6c7a69df695d8adab53436d84b560a704ef52c41316c
|
|
| MD5 |
681d9264e043526f4916a81539303fa4
|
|
| BLAKE2b-256 |
7ec9a0e8ceedd0a9a74f90ec36f7c61147371a444fb3d54d261f489ff098c3a3
|
File details
Details for the file fie_sdk-1.2.0-py3-none-any.whl.
File metadata
- Download URL: fie_sdk-1.2.0-py3-none-any.whl
- Upload date:
- Size: 16.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce8ae22b420ac87d8de403dc32009869b1129572bf2080470852644433808197
|
|
| MD5 |
a72f156ff538a046c446a10ff220e47a
|
|
| BLAKE2b-256 |
f9a6b4ff341640832d70b1f4ec10460eb4a0c672acff45f6bf44e3b1a817bd38
|