Counterfactual evaluation toolkit for RAG systems
Project description
🧪 EvalKit
A simple evaluation tool to measure how much of your retrieved context is actually needed to answer a question.
🚀 Why EvalKit?
RAG systems often retrieve too much irrelevant context.
EvalKit helps you answer:
Which retrieved chunks were actually useful for generating the answer?
📊 What it does
Given:
- a user query
- retrieved contexts
- a final answer
EvalKit computes:
✔ Retrieval Efficiency
How much of the retrieved context was actually necessary.
✔ Required vs Redundant Contexts
Which chunks helped vs which were unnecessary.
⚡ Quickstart
1. Install
pip install evalkit-rag
Why EvalKit
| Tool | Focus | Missing insight |
|---|---|---|
| Ragas | retrieval + answer score | context necessity |
| DeepEval | LLM eval benchmarks | counterfactuals |
| EvalKit | context necessity (WHY) | — |
🦙 2. (Optional) Run Ollama
ollama serve
ollama pull llama3
⚡ 3. Use EvalKit
from evalkit import RetrievalEfficiency, LLMJudge
judge = LLMJudge(model="ollama/llama3")
metric = RetrievalEfficiency(judge)
query = "What is refund policy?"
contexts = [
"Refunds are allowed within 30 days",
"Customer must provide receipt",
"Office hours are 9-5"
]
answer = "Refunds are allowed within 30 days with receipt"
result = metric.measure(query, contexts, answer)
print(result)
📊 Output
Retrieval Efficiency: 0.67
Required Contexts: [0, 2] Redundant Contexts: [1]
🧠 What this tells you
Which chunks were necessary Which chunks were not needed How efficient your retrieval system is
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file evalkit_rag-0.1.1.tar.gz.
File metadata
- Download URL: evalkit_rag-0.1.1.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e02572e5fc20fd5ce7b90ebad1f82812e632c3f109ffe701ae08ab7afde216a
|
|
| MD5 |
2d6557320d7d530c403322ad05700dcf
|
|
| BLAKE2b-256 |
e0895d050c112a3a9e8a2f9aa34a5d72e5ae65e8388efb7d7e4e9f1543bdcfe0
|
File details
Details for the file evalkit_rag-0.1.1-py3-none-any.whl.
File metadata
- Download URL: evalkit_rag-0.1.1-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e351e5d3282f5e182c90dff94f228d762d695ab9dfbd0864adae1476bb7e4ce3
|
|
| MD5 |
ee1d4ae709089797564da9ba73d93a5e
|
|
| BLAKE2b-256 |
d75d90fca71ae351dc31bbbd12813861b7bb466eb9822c4351dd6dccacef5fc1
|