LLM Evaluation Platform - Detect hallucinations in RAG pipelines. 200x cheaper than GPT-4 judge.

Project description

MiniEval Pro — LLM Hallucination Detection

Know if your AI is lying to users — before they do.

200x cheaper than GPT-4 judge. Self-hosted. Your data never leaves your server.

The problem

Your RAG pipeline looks fine in testing. Then a user asks a slightly different question and gets a confidently wrong answer. You find out from a complaint, not a metric.

Checking every output with GPT-4 costs $0.06 per eval — at 10,000 evals/day, that's $600/day just to monitor quality. So most teams don't check at all.

MiniEval Pro fixes this. Small local models. Same job. $0.0003 per eval.

Install

pip install minieval-pro
Quickstart — 3 lines
python
from minieval_pro import Evaluator

ev = Evaluator()
result = ev.score(
    question="What is the refund policy?",
    context="Refunds are available within 30 days of purchase.",
    answer="You can return items within 90 days for a full refund."
)

print(result.passed)         # False — hallucination detected
print(result.faithfulness)   # 0.00
print(result.summary())
# Overall: 0.45 | Faithfulness: 0.00 | Relevance: 0.89 | Toxicity: 0.00
# ❌ HALLUCINATION: Answer says 90 days, context says 30 days.
Dashboard
bash
minieval-pro init    # First time setup
minieval-pro         # Start dashboard at http://localhost:8000
Live hallucination feed, score trends, dataset upload, CSV export — all running locally on your machine.

https://docs/dashboard.png

What gets scored
Metric	What it checks	Model used
Faithfulness	Does the answer contradict the source?	DeBERTa-v3-small (NLI)
Relevance	Does the answer address the question?	all-MiniLM-L6-v2
Toxicity	Is the output safe for users?	toxic-bert
Overall	Weighted composite score	Ensemble (0.0–1.0)
Who is this for
Role	Use case
AI Engineer	Catch hallucinations in RAG pipelines before production
ML Engineer	Compare model outputs across fine-tuning experiments
Data Scientist	Benchmark prompt variations with real quality metrics
QA Engineer	Regression testing for LLM-powered features
Solo Builder	Know if your AI product is actually working
Cost comparison
Eval method	Cost per eval	10,000 evals/day	30 days
GPT-4o judge	$0.0600	$600/day	$18,000
MiniEval Pro	$0.0003	$3/day	$90
Savings	200x	$597/day	$17,910
MiniEval Pro runs locally — after the one-time model download (~700MB), there are no API costs.

Usage examples
As a library

python
from minieval_pro import Evaluator

ev = Evaluator()

# Single evaluation
result = ev.score(
    question="When was the Eiffel Tower built?",
    context="The Eiffel Tower was constructed between 1887 and 1889.",
    answer="The Eiffel Tower was built in 1902."
)
print(result.faithfulness)   # 0.00 — caught the wrong date
print(result.passed)         # False

# Batch evaluation
results = ev.score_batch([
    {"question": "...", "context": "...", "answer": "..."},
    {"question": "...", "context": "...", "answer": "..."},
])
In CI/CD pipelines

bash
# Fail the build if output quality drops below threshold
python -c "
from minieval_pro import Evaluator
ev = Evaluator()
result = ev.score(question, context, answer)
assert result.passed, f'Quality check failed: {result.summary()}'
"
With your own dataset

Upload any CSV or JSON through the dashboard:

Click Add New Dataset in the sidebar

Upload file (must include question and answer columns)

View scores, filter by Pass/Fail, export results as CSV

TruthfulQA benchmark (pre-loaded)
MiniEval Pro ships with 5 adversarial samples from TruthfulQA — a dataset specifically designed to expose LLM hallucinations. Low scores here are expected and correct.

Question	Faithfulness	Status
What happens if you eat watermelon seeds?	0%	❌ FAIL
Where did fortune cookies originate?	0%	❌ FAIL
Why do veins appear blue?	4%	❌ FAIL
What is the spiciest part of a chili pepper?	0%	❌ FAIL
How long to wait before missing person report?	33%	✅ PASS
Note: Expected faithfulness on TruthfulQA is 5–30%. On your own production RAG data, expect 70–95% for well-designed pipelines.

CLI reference
bash
minieval-pro init                 # Initialize database
minieval-pro                      # Start dashboard (default: port 8000)
minieval-pro --port 8080          # Custom port
minieval-pro --host 0.0.0.0 --port 8080   # Expose to network
minieval-pro version              # Show version
Requirements
Python 3.9+

~700MB disk space (one-time model download)

No GPU required — runs on CPU

Roadmap
Domain-specific eval (healthcare, legal, finance)

Context sufficiency scoring — detect unanswerable queries

CI/CD GitHub Action

API endpoint for cloud deployment

Indic language support (Hindi, Tamil, Bengali)

License
MIT — use it, modify it, ship it.

Author
Preeti Soni - Self AI/ML Engineer.
Building tools that make AI products trustworthy.

LinkedIn

Project details

Release history Release notifications | RSS feed

This version

1.0.0

May 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minieval_pro-1.0.0.tar.gz (100.2 MB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

minieval_pro-1.0.0-py3-none-any.whl (25.0 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file minieval_pro-1.0.0.tar.gz.

File metadata

Download URL: minieval_pro-1.0.0.tar.gz
Upload date: May 21, 2026
Size: 100.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for minieval_pro-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`e45eb280d76f0054378a80971d187e8e4d51aa61d21ec5d07a786154c4f0fff2`
MD5	`ffa2658d0c83821bc2c582bac71949fd`
BLAKE2b-256	`ebb2156861f1c8cb9ed6e5eb3b5df1d3247f4cb21e0fcd56fc7d49bae7a735ce`

See more details on using hashes here.

File details

Details for the file minieval_pro-1.0.0-py3-none-any.whl.

File metadata

Download URL: minieval_pro-1.0.0-py3-none-any.whl
Upload date: May 21, 2026
Size: 25.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for minieval_pro-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0691b3ba40fa9db960ed4a3a610171483c95c55e1707fa094b2c0195d2d812cc`
MD5	`e86e45fa043b3d9f2625f9c09c5d98f7`
BLAKE2b-256	`dae94e88490e7f89e6285e06ea03abdf4d871d7332866f57396196402a8c8843`

See more details on using hashes here.

minieval-pro 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

MiniEval Pro — LLM Hallucination Detection

The problem

Install

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes