A focused retriever evaluation framework with standard ranking metrics.
Project description
🎯 Evret
A focused, lightweight retriever evaluation framework with standard Information Retrieval metrics
Evret brings standard Information Retrieval metrics to your RAG and search systems. Evaluate retrievers with Hit Rate, Recall, Precision, MRR, NDCG, and Average Precision in just a few lines of code. Built for simplicity, extensibility, and seamless integration with vector databases and AI frameworks.
🌟 Overview
Evret is a modern Python framework designed for evaluating retrieval systems in RAG (Retrieval-Augmented Generation) pipelines and search applications. It provides:
- Standard IR Metrics: Hit Rate, Recall, Precision, MRR, NDCG, and Average Precision
- Vector Database Support: Native adapters for Qdrant and other vector databases
- Framework Integration: Seamless adapters for LangChain
- Production-Ready: Type-safe, well-tested, and optimized for real-world use cases
Whether you're building a semantic search engine, evaluating RAG systems, or benchmarking retrieval models, Evret gives you the tools to measure what matters.
🚀 Quick Start
Installation
pip install evret
For optional integrations:
# Install all optional integrations
pip install evret[all]
5-Minute Evaluation
from evret import EvaluationDataset, Evaluator, HitRate, MRR, NDCG
# Load your evaluation dataset
dataset = EvaluationDataset.from_json("eval_data.json")
# Evaluate your retriever
evaluator = Evaluator(
retriever=my_retriever,
metrics=[HitRate(k=5), MRR(k=10), NDCG(k=10)]
)
results = evaluator.evaluate(dataset)
print(results.summary())
# Export results
results.to_json("results.json")
results.to_csv("results.csv")
Minimal Metrics Example
from evret import HitRate, Recall, Precision, MRR, NDCG, AveragePrecision
retrieved = [
["doc_1", "doc_5", "doc_2", "doc_9"],
["doc_8", "doc_7", "doc_6", "doc_3"],
]
relevant = [
{"doc_1", "doc_2"},
{"doc_3"},
]
metrics = [HitRate(k=3), Recall(k=3), Precision(k=3), MRR(k=3), NDCG(k=3)]
for metric in metrics:
score = metric.score(
retrieved_by_query=retrieved,
relevant_by_query=relevant,
)
print(f"{metric.name}: {score:.4f}")
🛠 Local Development Setup
Use uv for local development:
uv venv
source .venv/bin/activate
uv pip install -e .
Install all optional integrations:
uv pip install -e ".[all]"
Run tests:
pytest
📚 Documentation
Evret docs use MkDocs with Material theme.
Install docs dependencies:
uv pip install -e ".[docs]"
Run docs locally:
mkdocs serve
Build static docs:
mkdocs build
📊 Metrics
Evret supports all standard Information Retrieval metrics:
| Metric | Description | Use Case |
|---|---|---|
| Hit Rate@k | % of queries with at least one relevant doc in top-k | Binary relevance, recall-focused |
| Recall@k | % of relevant docs found in top-k | Comprehensive retrieval |
| Precision@k | % of top-k results that are relevant | Precision-focused systems |
| MRR@k | Mean Reciprocal Rank of first relevant doc | Single-answer retrieval |
| NDCG@k | Normalized Discounted Cumulative Gain | Rank-aware binary relevance quality |
| Average Precision@k | Area under precision-recall curve | Overall ranking quality |
Usage Example
from evret import HitRate, Recall, Precision, MRR, NDCG, AveragePrecision
metrics = [
HitRate(k=5),
Recall(k=10),
Precision(k=5),
MRR(k=10),
NDCG(k=10),
AveragePrecision(k=10),
]
# Score a single query
retrieved = ["doc_1", "doc_5", "doc_2"]
relevant = {"doc_1", "doc_2"}
for metric in metrics:
score = metric.score(
retrieved_by_query=[retrieved],
relevant_by_query=[relevant],
)
print(f"{metric.name}: {score:.4f}")
🔌 Integrations
Qdrant Vector Database
from evret.retrievers import QdrantRetriever
retriever = QdrantRetriever(
collection_name="docs",
query_encoder=embed_query,
url="http://localhost:6333",
id_field="doc_id",
)
LangChain Integration
from evret.integrations import LangChainRetrieverAdapter
# Wrap any Evret retriever for use in LangChain
lc_retriever = LangChainRetrieverAdapter(evret_retriever=retriever, k=5)
docs = lc_retriever.invoke("what is RAG?")
📁 Examples
Basic Evaluation Pipeline
from evret import EvaluationDataset, Evaluator, HitRate, MRR
dataset = EvaluationDataset.from_json("eval_data.json")
evaluator = Evaluator(retriever=my_retriever, metrics=[HitRate(k=5), MRR(k=10)])
results = evaluator.evaluate(dataset)
results.to_json("results.json")
results.to_csv("results.csv")
print(results.summary())
Custom Retriever
from evret.retrievers import BaseRetriever
from evret import RetrievalResult
class MyCustomRetriever(BaseRetriever):
def retrieve(self, query: str, k: int) -> list[RetrievalResult]:
self._validate_k(k)
# Your retrieval logic here
return [
RetrievalResult(doc_id="doc_1", score=0.95, metadata={"text": "..."}),
RetrievalResult(doc_id="doc_2", score=0.87),
]
Run Examples Locally
# Basic evaluation example
python examples/evaluate_retriever.py
# Jupyter notebook quickstart
jupyter notebook examples/langchain_rag_evaluation.ipynb
🧪 Testing
Run Unit Tests
pytest
Run Integration Tests
Integration tests require Docker to be running:
# Start Docker Desktop/daemon first, then run:
EVRET_RUN_INTEGRATION=1 pytest -m integration
This will spin up Docker containers for Qdrant and Chroma to run end-to-end tests.
📄 License
MIT License - see LICENSE for details.
🤝 Contributing
Contributions are welcome! Please read our Contributing Guide for details on our code of conduct and the process for submitting pull requests.
📚 Citation
If you use Evret in your research, please cite:
@software{evret2026,
title={Evret: A Focused Retriever Evaluation Framework},
author={lucifertrj},
year={2026},
}
Built with ❤️ for the RAG and IR community
GitHub • Issues • Discussions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file evret-0.0.1b0.tar.gz.
File metadata
- Download URL: evret-0.0.1b0.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54c9c9bd2f8966992a856eadf8484c2c47f0e17fae5cd021d4d182a5a714b3f2
|
|
| MD5 |
6ac7a9f9b86402b16dadb6cae09ff632
|
|
| BLAKE2b-256 |
8a3b4ddf1e451380696d5e77f67057ce23c7ae87d94bfb663d6fe736d8ad1820
|
File details
Details for the file evret-0.0.1b0-py3-none-any.whl.
File metadata
- Download URL: evret-0.0.1b0-py3-none-any.whl
- Upload date:
- Size: 25.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dafa4670e7b533dcb5e39ecca782b42bd776eb2b001c57abc88ca5ed295531d6
|
|
| MD5 |
5cdb6c7d3eb7e85549501143c3738a77
|
|
| BLAKE2b-256 |
ac75e7fb2b11316b2edfee0cd0526e464d2facfb3e532f001ae8642e08664802
|