Standalone RAG evaluation library — run LLM-as-judge evaluation locally with your own API key

These details have not been verified by PyPI

Project links

Project description

rageval-ai

Standalone RAG evaluation library — run LLM-as-judge evaluation locally with your own API key.

Evaluate your RAG pipelines with 9+ metrics including hallucination detection, answer relevancy, faithfulness, and more. No server needed — everything runs locally.

Installation

pip install rageval-ai

Quick Start

import os
from rageval_sdk import evaluate

result = evaluate(
    question="What is the capital of France?",
    answer="The capital of France is Paris.",
    contexts=["Paris is the capital and largest city of France."],
    ground_truth="Paris",
    api_key=os.environ["OPENAI_API_KEY"],
)

print(f"Overall Score: {result['overall_score']}")
print(f"Hallucination: {result['hallucination_score']}")
print(f"Faithfulness:  {result['faithfulness']}")
print(f"Relevancy:     {result['answer_relevancy']}")
print(f"Cost:          ${result['cost_usd']}")

Environment Variables

export OPENAI_API_KEY="sk-your-openai-key"

Custom Configuration

from rageval_sdk import evaluate, EvalConfig

config = EvalConfig(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url="https://api.openai.com/v1",     # or OpenRouter, Azure, etc.
    stage_1_model="gpt-4o",                   # reasoning model
    stage_2_model="gpt-4o-mini",              # JSON conversion model
    rag_metrics_model="gpt-4o-mini",          # RAG metrics model
)

result = evaluate(
    question="What is RAG?",
    answer="RAG is Retrieval-Augmented Generation.",
    config=config,
)

Async Usage

import asyncio
from rageval_sdk import evaluate_trace, EvalConfig

async def main():
    config = EvalConfig(api_key="sk-...")
    result = await evaluate_trace(
        question="What is RAG?",
        answer="RAG is Retrieval-Augmented Generation.",
        contexts=["RAG combines retrieval with generation."],
        config=config,
    )
    print(result["overall_score"])

asyncio.run(main())

Background Evaluation (Non-blocking)

The RagEvaluator runs evaluations in background threads so your RAG pipeline is never blocked:

from rageval_sdk import RagEvaluator

evaluator = RagEvaluator(api_key=os.environ["OPENAI_API_KEY"], max_workers=4)

# Your RAG pipeline runs normally — evaluation happens in background
for query in user_queries:
    answer, contexts = my_rag_pipeline(query)  # your existing code

    # Non-blocking: submits and returns immediately
    evaluator.submit(
        question=query,
        answer=answer,
        contexts=contexts,
    )

# Check how many are done
print(f"Completed: {evaluator.completed_count}, Pending: {evaluator.pending_count}")

# When ready, collect all results
results = evaluator.wait()
for r in results:
    print(f"Score: {r['overall_score']}, Hallucination: {r['hallucination_score']}")

evaluator.shutdown()

Batch Evaluation

Evaluate multiple traces at once:

from rageval_sdk import RagEvaluator

with RagEvaluator(api_key=os.environ["OPENAI_API_KEY"]) as evaluator:
    results = evaluator.evaluate_batch([
        {
            "question": "What is RAG?",
            "answer": "RAG is Retrieval-Augmented Generation.",
            "contexts": ["RAG combines retrieval with generation."],
        },
        {
            "question": "What is Python?",
            "answer": "Python is a programming language.",
            "contexts": ["Python was created by Guido van Rossum."],
        },
    ])

    for r in results:
        print(f"Score: {r['overall_score']}")

Features

Standalone — No server needed, runs entirely locally
Background Evaluation — Non-blocking evaluation with RagEvaluator
Batch Support — Evaluate multiple traces concurrently
9+ Metrics — Hallucination, relevancy, faithfulness, completeness, and more
Parallel Pipeline — Stage 1 + Stage 2 + RAG metrics run concurrently
OpenAI Compatible — Works with OpenAI, OpenRouter, Azure, or any compatible API
Retry & Circuit Breaker — Production-grade reliability
Typed — Full type hints with py.typed marker
Lightweight — Only httpx as required dependency

Evaluation Metrics

Metric	Description
`overall_score`	Weighted combination of all metrics
`hallucination_score`	Detects fabricated information (claim-level)
`faithfulness`	Ensures answer is grounded in context
`answer_relevancy`	Measures answer relevance to the question
`completeness`	Key-point coverage verification
`context_precision`	Evaluates quality of retrieved contexts
`context_recall`	Checks if all needed facts are retrieved
`citation_check`	Validates source citations against contexts
`clarity`	Answer clarity and readability
`coherence`	Logical flow and consistency
`helpfulness`	How actionable/useful the answer is
`is_off_topic`	Off-topic detection
`is_deflection`	Deflection detection ("I don't know")

API Reference

`evaluate()`

result = evaluate(
    question,                    # str — the user question
    answer,                      # str — the LLM answer
    contexts=None,               # list[str] — retrieved context passages
    ground_truth=None,           # str — expected correct answer
    api_key=None,                # str — your OpenAI API key
    config=None,                 # EvalConfig — full configuration
    **config_overrides,          # additional EvalConfig fields
)

`EvalConfig`

config = EvalConfig(
    api_key="sk-...",                           # Required
    base_url="https://api.openai.com/v1",       # LLM endpoint
    stage_1_model="gpt-4o",                     # Reasoning model
    stage_2_model="gpt-4o-mini",                # JSON model
    rag_metrics_model="gpt-4o-mini",            # RAG metrics model
    timeout_seconds=120.0,                      # Request timeout
)

License

MIT — see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.2

May 11, 2026

0.2.1

May 11, 2026

0.2.0

May 11, 2026

This version

0.1.2

Apr 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rageval_ai-0.1.2.tar.gz (26.8 kB view details)

Uploaded Apr 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rageval_ai-0.1.2-py3-none-any.whl (32.8 kB view details)

Uploaded Apr 29, 2026 Python 3

File details

Details for the file rageval_ai-0.1.2.tar.gz.

File metadata

Download URL: rageval_ai-0.1.2.tar.gz
Upload date: Apr 29, 2026
Size: 26.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for rageval_ai-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`3f92d6b47bf729dbed217ded58aa4f482f58388f2ead5fc4de25822a59e860c8`
MD5	`8edf95dbc3cdbca274152c19a989e1d5`
BLAKE2b-256	`eaeda18c7fd1b54f42b76164ad17360b33904c71d995d34f192eb9f72abea104`

See more details on using hashes here.

File details

Details for the file rageval_ai-0.1.2-py3-none-any.whl.

File metadata

Download URL: rageval_ai-0.1.2-py3-none-any.whl
Upload date: Apr 29, 2026
Size: 32.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for rageval_ai-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`79e9e3587974937a3aaeaab05612aad7124c2b3a29d3988c145bc3ac3ed1b473`
MD5	`13cc0b238c2a7b837a44da4f69fe0256`
BLAKE2b-256	`edf3f4fdd2b78538574acc75ca1d31f2c6b7b7be5d0b4ab99ace6446f2237b1a`

See more details on using hashes here.

rageval-ai 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

rageval-ai

Installation

Quick Start

Environment Variables

Custom Configuration

Async Usage

Background Evaluation (Non-blocking)

Batch Evaluation

Features

Evaluation Metrics

API Reference

`evaluate()`

`EvalConfig`

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes