Skip to main content

Counterfactual evaluation toolkit for RAG systems

Project description

🧪 EvalKit

A simple evaluation tool to measure how much of your retrieved context is actually needed to answer a question.


🚀 Why EvalKit?

RAG systems often retrieve too much irrelevant context.

EvalKit helps you answer:

Which retrieved chunks were actually useful for generating the answer?


📊 What it does

Given:

  • a user query
  • retrieved contexts
  • a final answer

EvalKit computes:

✔ Retrieval Efficiency

How much of the retrieved context was actually necessary.

✔ Required vs Redundant Contexts

Which chunks helped vs which were unnecessary.


⚡ Quickstart

1. Install

pip install evalkit-rag 

Why EvalKit

Tool Focus Missing insight
Ragas retrieval + answer score context necessity
DeepEval LLM eval benchmarks counterfactuals
EvalKit context necessity (WHY)

🦙 2. (Optional) Run Ollama

ollama serve
ollama pull llama3

⚡ 3. Use EvalKit

from evalkit import RetrievalEfficiency, LLMJudge

judge = LLMJudge(model="ollama/llama3")

metric = RetrievalEfficiency(judge)

query = "What is refund policy?"

contexts = [
    "Refunds are allowed within 30 days",
    "Customer must provide receipt",
    "Office hours are 9-5"
]

answer = "Refunds are allowed within 30 days with receipt"

result = metric.measure(query, contexts, answer)

print(result)

📊 Output

Retrieval Efficiency: 0.67

Required Contexts: [0, 2] Redundant Contexts: [1]

🧠 What this tells you

Which chunks were necessary Which chunks were not needed How efficient your retrieval system is

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalkit_rag-0.1.1.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evalkit_rag-0.1.1-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file evalkit_rag-0.1.1.tar.gz.

File metadata

  • Download URL: evalkit_rag-0.1.1.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for evalkit_rag-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2e02572e5fc20fd5ce7b90ebad1f82812e632c3f109ffe701ae08ab7afde216a
MD5 2d6557320d7d530c403322ad05700dcf
BLAKE2b-256 e0895d050c112a3a9e8a2f9aa34a5d72e5ae65e8388efb7d7e4e9f1543bdcfe0

See more details on using hashes here.

File details

Details for the file evalkit_rag-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: evalkit_rag-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for evalkit_rag-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e351e5d3282f5e182c90dff94f228d762d695ab9dfbd0864adae1476bb7e4ce3
MD5 ee1d4ae709089797564da9ba73d93a5e
BLAKE2b-256 d75d90fca71ae351dc31bbbd12813861b7bb466eb9822c4351dd6dccacef5fc1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page