Standalone RAG evaluation library — run LLM-as-judge evaluation locally with your own API key
Project description
rageval-ai
Standalone RAG evaluation library — run LLM-as-judge evaluation locally with your own API key.
Evaluate your RAG pipelines with 9+ metrics including hallucination detection, answer relevancy, faithfulness, and more. No server needed — everything runs locally.
Installation
pip install rageval-ai
Quick Start
import os
from rageval_sdk import evaluate
result = evaluate(
question="What is the capital of France?",
answer="The capital of France is Paris.",
contexts=["Paris is the capital and largest city of France."],
ground_truth="Paris",
api_key=os.environ["OPENAI_API_KEY"],
)
print(f"Overall Score: {result['overall_score']}")
print(f"Hallucination: {result['hallucination_score']}")
print(f"Faithfulness: {result['faithfulness']}")
print(f"Relevancy: {result['answer_relevancy']}")
print(f"Cost: ${result['cost_usd']}")
Environment Variables
export OPENAI_API_KEY="sk-your-openai-key"
Custom Configuration
from rageval_sdk import evaluate, EvalConfig
config = EvalConfig(
api_key=os.environ["OPENAI_API_KEY"],
base_url="https://api.openai.com/v1", # or OpenRouter, Azure, etc.
stage_1_model="gpt-4o", # reasoning model
stage_2_model="gpt-4o-mini", # JSON conversion model
rag_metrics_model="gpt-4o-mini", # RAG metrics model
)
result = evaluate(
question="What is RAG?",
answer="RAG is Retrieval-Augmented Generation.",
config=config,
)
Async Usage
import asyncio
from rageval_sdk import evaluate_trace, EvalConfig
async def main():
config = EvalConfig(api_key="sk-...")
result = await evaluate_trace(
question="What is RAG?",
answer="RAG is Retrieval-Augmented Generation.",
contexts=["RAG combines retrieval with generation."],
config=config,
)
print(result["overall_score"])
asyncio.run(main())
Background Evaluation (Non-blocking)
The RagEvaluator runs evaluations in background threads so your RAG pipeline is never blocked:
from rageval_sdk import RagEvaluator
evaluator = RagEvaluator(api_key=os.environ["OPENAI_API_KEY"], max_workers=4)
# Your RAG pipeline runs normally — evaluation happens in background
for query in user_queries:
answer, contexts = my_rag_pipeline(query) # your existing code
# Non-blocking: submits and returns immediately
evaluator.submit(
question=query,
answer=answer,
contexts=contexts,
)
# Check how many are done
print(f"Completed: {evaluator.completed_count}, Pending: {evaluator.pending_count}")
# When ready, collect all results
results = evaluator.wait()
for r in results:
print(f"Score: {r['overall_score']}, Hallucination: {r['hallucination_score']}")
evaluator.shutdown()
Batch Evaluation
Evaluate multiple traces at once:
from rageval_sdk import RagEvaluator
with RagEvaluator(api_key=os.environ["OPENAI_API_KEY"]) as evaluator:
results = evaluator.evaluate_batch([
{
"question": "What is RAG?",
"answer": "RAG is Retrieval-Augmented Generation.",
"contexts": ["RAG combines retrieval with generation."],
},
{
"question": "What is Python?",
"answer": "Python is a programming language.",
"contexts": ["Python was created by Guido van Rossum."],
},
])
for r in results:
print(f"Score: {r['overall_score']}")
Features
- Standalone — No server needed, runs entirely locally
- Background Evaluation — Non-blocking evaluation with
RagEvaluator - Batch Support — Evaluate multiple traces concurrently
- 9+ Metrics — Hallucination, relevancy, faithfulness, completeness, and more
- Parallel Pipeline — Stage 1 + Stage 2 + RAG metrics run concurrently
- OpenAI Compatible — Works with OpenAI, OpenRouter, Azure, or any compatible API
- Retry & Circuit Breaker — Production-grade reliability
- Typed — Full type hints with
py.typedmarker - Lightweight — Only
httpxas required dependency
Evaluation Metrics
| Metric | Description |
|---|---|
overall_score |
Weighted combination of all metrics |
hallucination_score |
Detects fabricated information (claim-level) |
faithfulness |
Ensures answer is grounded in context |
answer_relevancy |
Measures answer relevance to the question |
completeness |
Key-point coverage verification |
context_precision |
Evaluates quality of retrieved contexts |
context_recall |
Checks if all needed facts are retrieved |
citation_check |
Validates source citations against contexts |
clarity |
Answer clarity and readability |
coherence |
Logical flow and consistency |
helpfulness |
How actionable/useful the answer is |
is_off_topic |
Off-topic detection |
is_deflection |
Deflection detection ("I don't know") |
API Reference
evaluate()
result = evaluate(
question, # str — the user question
answer, # str — the LLM answer
contexts=None, # list[str] — retrieved context passages
ground_truth=None, # str — expected correct answer
api_key=None, # str — your OpenAI API key
config=None, # EvalConfig — full configuration
**config_overrides, # additional EvalConfig fields
)
EvalConfig
config = EvalConfig(
api_key="sk-...", # Required
base_url="https://api.openai.com/v1", # LLM endpoint
stage_1_model="gpt-4o", # Reasoning model
stage_2_model="gpt-4o-mini", # JSON model
rag_metrics_model="gpt-4o-mini", # RAG metrics model
timeout_seconds=120.0, # Request timeout
)
License
MIT — see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rageval_ai-0.1.2.tar.gz.
File metadata
- Download URL: rageval_ai-0.1.2.tar.gz
- Upload date:
- Size: 26.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f92d6b47bf729dbed217ded58aa4f482f58388f2ead5fc4de25822a59e860c8
|
|
| MD5 |
8edf95dbc3cdbca274152c19a989e1d5
|
|
| BLAKE2b-256 |
eaeda18c7fd1b54f42b76164ad17360b33904c71d995d34f192eb9f72abea104
|
File details
Details for the file rageval_ai-0.1.2-py3-none-any.whl.
File metadata
- Download URL: rageval_ai-0.1.2-py3-none-any.whl
- Upload date:
- Size: 32.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79e9e3587974937a3aaeaab05612aad7124c2b3a29d3988c145bc3ac3ed1b473
|
|
| MD5 |
13cc0b238c2a7b837a44da4f69fe0256
|
|
| BLAKE2b-256 |
edf3f4fdd2b78538574acc75ca1d31f2c6b7b7be5d0b4ab99ace6446f2237b1a
|