A focused retriever evaluation framework with standard ranking metrics.
Project description
🎯 Evret
A focused, lightweight retriever evaluation framework with standard Information Retrieval metrics
Evret brings standard Information Retrieval metrics to your recommendation, RAG and search systems. Evaluate retrievers with Hit Rate, Recall, Precision, MRR, NDCG, ERR, RBP and Average Precision in just a few lines of code. Built for simplicity, extensibility, and seamless integration with vector databases and AI frameworks.
🌟 Overview
Evret is a modern Python framework designed for evaluating retrieval systems in Information Retrieval pipelines and search applications. It provides:
- Standard IR Metrics: Hit Rate, Recall, Precision, MRR, NDCG, ERR, RBP, and Average Precision
- Judge-Based Matching: Token overlap, semantic, and LLM judges for text relevance
- Vector Database Support: Native adapters for Qdrant and other vector databases
- Framework Integration: Adapters for LangChain and LlamaIndex
🚀 Quick Start
Installation
pip install evret
For optional integrations:
pip install evret[all]
# Install specific integrations
pip install "evret[qdrant]"
pip install "evret[langchain]"
pip install "evret[semantic]"
5-Minute Evaluation
from evret import EvaluationDataset, Evaluator, HitRate, MRR, NDCG, TokenOverlapJudge
dataset = EvaluationDataset.from_json("eval_data.json")
evaluator = Evaluator(
retriever=my_retriever,
metrics=[HitRate(k=4), MRR(k=4), NDCG(k=4)],
judge=TokenOverlapJudge(min_tokens=15, overlap_ratio=0.6),
)
results = evaluator.evaluate(dataset)
print(results.summary())
# Optional: Export results
results.to_json("results.json")
results.to_csv("results.csv")
🛠 Local Development Setup
Use uv for local development:
uv venv
source .venv/bin/activate
uv pip install -e .
Install all optional integrations:
uv pip install -e ".[all]"
Run the default test suite:
pytest
See tests/README.md for coverage areas, optional integration tests, and test setup notes.
📊 Metrics
Evret supports all standard Information Retrieval metrics:
| Metric | Description | Use Case |
|---|---|---|
| Hit Rate@k | % of queries with at least one relevant doc in top-k | Binary relevance, recall-focused |
| Recall@k | % of relevant docs found in top-k | Comprehensive retrieval |
| Precision@k | % of top-k results that are relevant | Precision-focused systems |
| MRR@k | Mean Reciprocal Rank of first relevant doc | Single-answer retrieval |
| NDCG@k | Normalized Discounted Cumulative Gain | Rank-aware binary relevance quality |
| ERR@k | Expected Reciprocal Rank with cascade satisfaction | Graded relevance and user satisfaction |
| RBP@k | Rank-Biased Precision with tunable persistence | User patience and position-weighted quality |
| Average Precision@k | Area under precision-recall curve | Overall ranking quality |
📋 Evaluation Datasets
Create evaluation datasets with queries and expected answers for judge-based evaluation:
from evret import EvaluationDataset, QueryExample, DocumentExample
dataset = EvaluationDataset(
documents=[
DocumentExample(doc_id="doc_1", text="Python uses pip for packages."),
DocumentExample(doc_id="doc_2", text="Virtual environments isolate dependencies."),
],
queries=[
QueryExample(
query_id="q1",
query_text="How to install Python packages?",
expected_answers=["pip install"] # Judge matches this against retrieved text
)
]
)
Load datasets from JSON or CSV files:
dataset = EvaluationDataset.from_json("eval_data.json")
dataset = EvaluationDataset.from_csv("eval_data.csv")
For detailed dataset format documentation and more examples, see the Dataset Format Guide
⚖️ Judges
Judges decide whether retrieved text matches the expected text in your evaluation dataset. Evaluator uses TokenOverlapJudge() by default, and you can pass judge= when you want explicit matching behavior.
from evret import Evaluator, HitRate, Recall
from evret.judges import TokenOverlapJudge
evaluator = Evaluator(
retriever=my_retriever,
metrics=[HitRate(k=4), Recall(k=4)],
judge=TokenOverlapJudge(min_tokens=15, overlap_ratio=0.6),
)
Use SemanticJudge for embedding similarity and LLMJudge for LLM-provider based judgment.
from evret.judges import LLMJudge, SemanticJudge
semantic_judge = SemanticJudge(threshold=0.75)
llm_judge = LLMJudge(provider="openai", model="gpt-4o-mini")
🔌 Integrations
Qdrant Vector Database
from evret.retrievers import QdrantRetriever
retriever = QdrantRetriever(
collection_name="docs",
query_encoder=embed_query,
url="http://localhost:6333",
id_field="doc_id",
)
LangChain Integration
from evret.integrations import LangChainRetrieverAdapter
# Wrap any Evret retriever for use in LangChain
lc_retriever = LangChainRetrieverAdapter(evret_retriever=retriever, k=5)
docs = lc_retriever.invoke("what is information retrieval?")
⚙️ Configuration
Logging
Evret uses standard Python logging. Set the log level via environment variable:
export EVRET_LOG_LEVEL=INFO # INFO, DEBUG, WARNING, ERROR
Or configure programmatically:
import logging
logging.basicConfig(level=logging.INFO, format="%(levelname)s [%(name)s] %(message)s")
Default level is WARNING to avoid spamming logs in production.
🧪 Testing
Run the default suite:
pytest
Run Docker-backed integration tests:
EVRET_RUN_INTEGRATION=1 pytest -m integration
More details are in tests/README.md.
📚 Documentation
Evret docs use MkDocs with Material theme.
Install docs dependencies:
uv pip install -e ".[docs]"
Run docs locally:
mkdocs serve
Build static docs:
mkdocs build
📄 License
MIT License - see LICENSE for details.
🤝 Contributing
Contributions are welcome! Please read our Contributing Guide for details on our code of conduct and the process for submitting pull requests.
📚 Citation
If you use Evret in your research, please cite:
@software{evret2026,
title={Evret: A Focused Retriever Evaluation Framework},
author={lucifertrj},
year={2026},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file evret-0.0.3.tar.gz.
File metadata
- Download URL: evret-0.0.3.tar.gz
- Upload date:
- Size: 45.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.11.13 Darwin/25.0.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0786e5412627fce23be31ab51a13e534e481398409fc17868a69d9358585b392
|
|
| MD5 |
2e3e7e8b72e7fbb3b8508cde39ccb744
|
|
| BLAKE2b-256 |
6525579e9c8651b164108aad8f0daf817a8176f5528be54282dcd1f734115456
|
File details
Details for the file evret-0.0.3-py3-none-any.whl.
File metadata
- Download URL: evret-0.0.3-py3-none-any.whl
- Upload date:
- Size: 62.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.11.13 Darwin/25.0.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64df591a5f49ebb9ab268e1a9efc82189bcecd0000e6fab153327baa0470d00f
|
|
| MD5 |
a5104ef8ff318e9143e896326dabc457
|
|
| BLAKE2b-256 |
1baab2cb6380cfc119a68023fa0847a6161980c3e4fae9afb3f82d95310000b4
|