Skip to main content

Standalone RAG evaluation library — run LLM-as-judge evaluation locally with your own API key

Project description

rageval-ai

Standalone RAG evaluation library — run LLM-as-judge evaluation locally with your own API key.

Installation

pip install rageval-ai

Quick Start

Set your API key:

export OPENAI_API_KEY="sk-your-key"

Single Trace (3 lines)

from rageval_sdk import evaluate

result = evaluate("What is the capital of France?", "Paris.", ["Paris is the capital of France."])
print(result["overall_score"])  # 0.95

Batch Evaluation

from rageval_sdk import evaluate_batch

results = evaluate_batch([
    {"question": "What is RAG?", "answer": "Retrieval-Augmented Generation.", "contexts": ["RAG combines retrieval with generation."]},
    {"question": "What is Python?", "answer": "A programming language.", "contexts": ["Python was created by Guido van Rossum."]},
])

for r in results:
    print(f"Score: {r['overall_score']}")

Custom Configuration

from rageval_sdk import evaluate, EvalConfig

config = EvalConfig(
    api_key="sk-...",
    base_url="https://openrouter.ai/api/v1",  # OpenAI, Azure, OpenRouter, etc.
    stage_1_model="gpt-4o",
    stage_2_model="gpt-4o-mini",
)

result = evaluate("Question?", "Answer.", ["Context."], config=config)

Background Evaluation (Non-blocking)

from rageval_sdk import RagEvaluator

evaluator = RagEvaluator(api_key="sk-...", max_workers=4)

for query in user_queries:
    answer, contexts = my_rag_pipeline(query)
    evaluator.submit(question=query, answer=answer, contexts=contexts)

results = evaluator.wait()
evaluator.shutdown()

Evaluation Metrics

Metric Description
overall_score Weighted composite score (0-1)
hallucination_score Detects fabricated information
faithfulness Answer grounded in context
answer_relevancy Answer relevance to question
context_precision Quality of retrieved context
context_recall Coverage of necessary information
clarity Answer clarity
coherence Answer coherence
helpfulness Answer helpfulness
citation_check Source citation validation

API Reference

evaluate(question, answer, contexts, ground_truth, *, api_key, config)

Evaluate a single trace. Returns dict with all metrics.

  • question (str): User question
  • answer (str): LLM answer
  • contexts (list[str], optional): Retrieved context passages
  • ground_truth (str, optional): Expected answer
  • api_key (str, optional): Auto-detected from OPENAI_API_KEY env
  • config (EvalConfig, optional): Custom configuration

evaluate_batch(traces, *, api_key, config, max_concurrency)

Evaluate multiple traces in parallel. Returns list of result dicts.

  • traces (list[dict]): List of {"question", "answer", "contexts", "ground_truth"}
  • max_concurrency (int): Max parallel evaluations (default 4)

EvalConfig

Configuration object for custom LLM providers:

  • api_key: API key
  • base_url: API endpoint (default: OpenAI)
  • stage_1_model: Reasoning model (default: gpt-4o)
  • stage_2_model: JSON scoring model (default: gpt-4o-mini)
  • rag_metrics_model: RAG metrics model (default: gpt-4o-mini)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rageval_ai-0.2.0.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rageval_ai-0.2.0-py3-none-any.whl (30.5 kB view details)

Uploaded Python 3

File details

Details for the file rageval_ai-0.2.0.tar.gz.

File metadata

  • Download URL: rageval_ai-0.2.0.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rageval_ai-0.2.0.tar.gz
Algorithm Hash digest
SHA256 79fd7f35adad8eb544ef481976c72fa54a0bef2d04a46888b4746e740e5c400d
MD5 321335bb68c0292c30b547c44fbff3ce
BLAKE2b-256 ce2fea5e7d944f0cfaa8b70d3fdac83999ff1b52c136cf6549dab40349315e54

See more details on using hashes here.

Provenance

The following attestation bundles were made for rageval_ai-0.2.0.tar.gz:

Publisher: publish.yml on CYBki/llm-evaluation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rageval_ai-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: rageval_ai-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rageval_ai-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe439179c46b4b72d2fa29821a8222c70e723839debfa1b72dcfc98033a7c245
MD5 8bc35aea0d079b3c9fb0e820db2861a3
BLAKE2b-256 3ce2d53e0e7ec6a1063e9bed7f6b68960d93a987c04cc4175a1a1c9ef30f7867

See more details on using hashes here.

Provenance

The following attestation bundles were made for rageval_ai-0.2.0-py3-none-any.whl:

Publisher: publish.yml on CYBki/llm-evaluation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page