Skip to main content

RAG evaluation library — LLM-as-judge for hallucination detection, faithfulness, answer relevancy. Local or self-hosted. Alternative to ragas.

Project description

rageval-ai

Standalone RAG evaluation library — evaluate your RAG/LLM outputs with LLM-as-judge methodology.

Two modes:

  • Local mode — No server needed, runs evaluation locally with your API key
  • Server mode — Deploy FastAPI server, send traces from anywhere

Installation

pip install rageval-ai

Mode 1: Local Evaluation (No Server)

Set your API key once:

export OPENAI_API_KEY="sk-your-key"

Single Trace (3 lines)

from rageval_sdk import evaluate

result = evaluate("What is the capital of France?", "Paris.", ["Paris is the capital of France."])
print(result["overall_score"])  # 0.95

Batch Evaluation

from rageval_sdk import evaluate_batch

results = evaluate_batch([
    {"question": "What is RAG?", "answer": "Retrieval-Augmented Generation.", "contexts": ["RAG combines retrieval with generation."]},
    {"question": "What is Python?", "answer": "A programming language.", "contexts": ["Python was created by Guido van Rossum."]},
])

for r in results:
    print(f"Score: {r['overall_score']}")

Custom Provider (OpenRouter, Azure, etc.)

from rageval_sdk import evaluate, EvalConfig

config = EvalConfig(
    api_key="sk-or-...",
    base_url="https://openrouter.ai/api/v1",
    stage_1_model="qwen/qwen3-235b-a22b-2507",
    stage_2_model="qwen/qwen3-32b",
)

result = evaluate("Question?", "Answer.", ["Context."], config=config)

Mode 2: Self-Hosted Server

Deploy the FastAPI evaluation server on your own infrastructure, then send traces from any client.

1. Deploy Server

git clone https://github.com/CYBki/llm-evaluation.git
cd llm-evaluation

# Configure
cp .env.example .env
nano .env  # set your OPENAI_API_KEY and other settings

# Start
docker compose up -d

# Verify
curl http://localhost:8000/health

2. Send Traces via SDK

from rageval_sdk import RagEvalClient

client = RagEvalClient(
    api_url="http://your-server:8000",
    api_key="your-api-key",
)

# Submit trace for evaluation
result = client.ingest(
    question="What is the capital of France?",
    answer="The capital of France is Paris.",
    contexts=["Paris is the capital and largest city of France."],
)

# Get evaluation results
trace = client.get_trace(result["id"])
print(trace["evaluation"]["overall_score"])

3. Auto-Evaluate in Your RAG Pipeline

from rageval_sdk import RagEvalClient

client = RagEvalClient(api_url="http://your-server:8000", api_key="key")

def handle_query(query):
    answer, contexts = my_rag_pipeline(query)  # your existing code

    # Non-blocking: sends to server for background evaluation
    client.ingest(question=query, answer=answer, contexts=contexts)

    return answer  # user gets answer immediately

4. Webhook Notifications

client.ingest(
    question="Q",
    answer="A",
    contexts=["C"],
    webhook_url="https://your-app.com/webhook",  # results POSTed here when ready
)

Background Evaluation (Local, Non-blocking)

from rageval_sdk import RagEvaluator

evaluator = RagEvaluator(api_key="sk-...", max_workers=4)

for query in user_queries:
    answer, contexts = my_rag_pipeline(query)
    evaluator.submit(question=query, answer=answer, contexts=contexts)

results = evaluator.wait()
evaluator.shutdown()

Evaluation Metrics

Metric Description
overall_score Weighted composite score (0-1)
hallucination_score Detects fabricated information
faithfulness Answer grounded in context
answer_relevancy Answer relevance to question
context_precision Quality of retrieved context
context_recall Coverage of necessary information
clarity Answer clarity
coherence Answer coherence
helpfulness Answer helpfulness
completeness Answer completeness
citation_check Source citation validation

API Reference

Local Mode

Function Description
evaluate(question, answer, contexts) Evaluate single trace (sync)
evaluate_batch(traces) Evaluate multiple traces in parallel
evaluate_trace(question, answer, contexts, config=) Async version
RagEvaluator(max_workers=4) Background evaluator
EvalConfig(api_key=, base_url=, ...) Custom configuration

Server Mode

Method Description
RagEvalClient(api_url, api_key) Connect to server
client.ingest(question, answer, contexts) Submit trace
client.get_trace(trace_id) Get results
client.list_traces(limit, offset) List all traces
client.health() Check server health

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rageval_ai-0.2.2.tar.gz (27.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rageval_ai-0.2.2-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file rageval_ai-0.2.2.tar.gz.

File metadata

  • Download URL: rageval_ai-0.2.2.tar.gz
  • Upload date:
  • Size: 27.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rageval_ai-0.2.2.tar.gz
Algorithm Hash digest
SHA256 57201b5a6a135ed698e2f1270040fb0d5a4d544e8998ef9de0246ebbb96bd400
MD5 d5a64be30922c6b4d1fa658780b1671e
BLAKE2b-256 76a1b442d4326ad9b8136218a8e152f2d392d728ec093f11dbd9c33e139f90cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for rageval_ai-0.2.2.tar.gz:

Publisher: publish.yml on CYBki/llm-evaluation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rageval_ai-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: rageval_ai-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 31.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rageval_ai-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ae93b30aab514b11a6ea858c02518cf087c04f9efff5026f8cb5848dfb297474
MD5 0d6a332e0ceb485ea8feeceb901ef095
BLAKE2b-256 0d3191e06b14b6c340f6dbd2c4cb0a6f2a6467a218945aab101d5184354a689a

See more details on using hashes here.

Provenance

The following attestation bundles were made for rageval_ai-0.2.2-py3-none-any.whl:

Publisher: publish.yml on CYBki/llm-evaluation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page