A comprehensive suite for evaluating RAG pipelines.
Project description
🏆 RAG Evaluation Suite
A comprehensive, asynchronous, and framework-agnostic library for evaluating Retrieval-Augmented Generation (RAG) pipelines.
This tool provides a complete, end-to-end workflow for RAG evaluation—from automatically synthesizing a high-quality test set from your own documents to running a suite of sophisticated, AI-powered diagnostic metrics.
✨ Key Features
- Comprehensive "RAG Triad" Metrics: Evaluate Context Relevance, Faithfulness, and Answer Relevance.
- Advanced Diagnostics: Includes Answer Completeness to pinpoint why an answer is strong or weak.
- Automated Test Case Generation: Use the built-in Data Synthesizer to create test cases from any document.
- High-Performance Async Pipeline: Powered by
asynciofor fast, parallel execution. - Framework-Agnostic: Works with any RAG system—LangChain, LlamaIndex, or plain Python.
- Flexible Judge Model: Supports GPT-4o, Claude 3 Opus, or local models (e.g., via Ollama).
🚀 Quick Start
1. Installation
pip install rag-eval-suite
2. Run Your First Evaluation
Create a Python script using the RAGEvaluator:
import asyncio
from rag_eval_suite import RAGEvaluator
from rag_eval_suite.data_models import TestCase, RAGResult
# 1. Instantiate the evaluator (uses local Ollama model by default)
evaluator = RAGEvaluator(judge_model="ollama/llama3")
# 2. Define test case and RAG system output
test_case = TestCase(
question="What are the notable features of the stadium's pitch and roof?",
ground_truth_context=["The stadium features a fully retractable roof... and a hybrid pitch..."],
ground_truth_answer="The stadium has a fully retractable roof and a hybrid pitch."
)
rag_result = RAGResult(
retrieved_context=["The stadium features a fully retractable roof... and a hybrid pitch..."],
final_answer="The stadium has a fully retractable roof." # Intentionally incomplete
)
# 3. Run the evaluation
async def main():
evaluation_result = await evaluator.aevaluate(test_case, rag_result)
print(evaluation_result.scores)
if __name__ == "__main__":
asyncio.run(main())
🔧 Configuration
To use other models (e.g., OpenAI’s GPT-4o), configure via environment variables:
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
evaluator = RAGEvaluator(judge_model="gpt-4o")
🤝 Contributing
Contributions are welcome! Feel free to open an issue or submit a pull request.
📄 License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ragscope-0.1.0.tar.gz.
File metadata
- Download URL: ragscope-0.1.0.tar.gz
- Upload date:
- Size: 25.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93e9fab856972c82f01b1eb3dfd8af4a1c7f400841d85a35d3e4b10475723b21
|
|
| MD5 |
36aebfe2590f8c3a3fa262a1855fb701
|
|
| BLAKE2b-256 |
3cd443c4bd7ec1b9001d456df777657510be17dd87759d97d2abfa6bbcd90366
|
File details
Details for the file ragscope-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ragscope-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa9493aad77569915393abc23525da9e0a4074e8f0ca8e5131a626f5f6e6d951
|
|
| MD5 |
2c8c23f0a9508d2fc616b332ac54c27b
|
|
| BLAKE2b-256 |
533d8ecd2c9bc8591e4f10b9fdf2dc363a0d5e2bb55a6d3bc72004fd8dddbf65
|