Counterfactual evaluation toolkit for RAG systems

Project description

🧪 EvalKit

A simple evaluation tool to measure how much of your retrieved context is actually needed to answer a question.

🚀 Why EvalKit?

RAG systems often retrieve too much irrelevant context.

EvalKit helps you answer:

Which retrieved chunks were actually useful for generating the answer?

📊 What it does

Given:

a user query
retrieved contexts
a final answer

EvalKit computes:

✔ Retrieval Efficiency

How much of the retrieved context was actually necessary.

✔ Required vs Redundant Contexts

Which chunks helped vs which were unnecessary.

⚡ Quickstart

1. Install

pip install evalkit-rag

Why EvalKit

Tool	Focus	Missing insight
Ragas	retrieval + answer score	context necessity
DeepEval	LLM eval benchmarks	counterfactuals
EvalKit	context necessity (WHY)	—

🦙 2. (Optional) Run Ollama

ollama serve
ollama pull llama3

⚡ 3. Use EvalKit

from evalkit import RetrievalEfficiency, LLMJudge

judge = LLMJudge(model="ollama/llama3")

metric = RetrievalEfficiency(judge)

query = "What is refund policy?"

contexts = [
    "Refunds are allowed within 30 days",
    "Customer must provide receipt",
    "Office hours are 9-5"
]

answer = "Refunds are allowed within 30 days with receipt"

result = metric.measure(query, contexts, answer)

print(result)

📊 Output

Retrieval Efficiency: 0.67

Required Contexts: [0, 2] Redundant Contexts: [1]

🧠 What this tells you

Which chunks were necessary Which chunks were not needed How efficient your retrieval system is

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Jun 17, 2026

0.1.0

Jun 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalkit_rag-0.1.1.tar.gz (4.6 kB view details)

Uploaded Jun 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

evalkit_rag-0.1.1-py3-none-any.whl (6.3 kB view details)

Uploaded Jun 17, 2026 Python 3

File details

Details for the file evalkit_rag-0.1.1.tar.gz.

File metadata

Download URL: evalkit_rag-0.1.1.tar.gz
Upload date: Jun 17, 2026
Size: 4.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for evalkit_rag-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`2e02572e5fc20fd5ce7b90ebad1f82812e632c3f109ffe701ae08ab7afde216a`
MD5	`2d6557320d7d530c403322ad05700dcf`
BLAKE2b-256	`e0895d050c112a3a9e8a2f9aa34a5d72e5ae65e8388efb7d7e4e9f1543bdcfe0`

See more details on using hashes here.

File details

Details for the file evalkit_rag-0.1.1-py3-none-any.whl.

File metadata

Download URL: evalkit_rag-0.1.1-py3-none-any.whl
Upload date: Jun 17, 2026
Size: 6.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for evalkit_rag-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e351e5d3282f5e182c90dff94f228d762d695ab9dfbd0864adae1476bb7e4ce3`
MD5	`ee1d4ae709089797564da9ba73d93a5e`
BLAKE2b-256	`d75d90fca71ae351dc31bbbd12813861b7bb466eb9822c4351dd6dccacef5fc1`

See more details on using hashes here.

evalkit-rag 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🧪 EvalKit

🚀 Why EvalKit?

📊 What it does

✔ Retrieval Efficiency

✔ Required vs Redundant Contexts

⚡ Quickstart

1. Install

Why EvalKit

🦙 2. (Optional) Run Ollama

⚡ 3. Use EvalKit

📊 Output

🧠 What this tells you

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes