Skip to main content

Lightweight evaluation framework for Retrieval Augmented Generation systems, focused on simplicity and long-term consistency.

Project description

ragret logo

RAG evaluation with fewer regrets.

Latest releasePyPiLicenserepo size

ragret is a lightweight, stable evaluation framework for Retrieval-Augmented Generation (RAG) systems that is designed for long-term consistency and only the necessary structural updates in mind.
Its goal is simplicity: small, modular metrics that are easy to understand, extend, and integrate into existing pipelines. It was created out of the frustration with other frameworks constantly changing, making code from one version to the next obsolete and difficult to migrate. With ragret, the focus is clear: simple, implement-as-you-go metrics that you can rely on without having to rewrite your established code or digging through docs to figure out what changed overnight in your favorite framework.

Metrics

ragret provides evaluation metrics for assessing different aspects of RAG system performance.
It includes both LLM-based and non-LLM-based metrics, which are described with more detail in METRICS

  • AnswerRelevancy
  • Faithfulness
  • ContextPrecision
  • ContextRecall
  • CosineSimilarity
  • F1Score
  • ProductRelevancy Custom (for systems related to product recommendation)

Installation

Use pip to install the package

pip install ragret

Or clone the repository locally:

git clone https://github.com/christopherkormpos/ragret.git
cd ragret

Supported providers

supported models

ragret currently supports only two LLM providers for generation and embeddings.
The default models for text generation are gpt-4.1-nano for OpenAI and gemma3:4b for Ollama.
For vector embeddings, the default models are text-embeddings-small-3 for OpenAI and nomic-embed-text for Ollama.

Basic Configuration

You will need to create a .env file and set your enviromental variable "API_KEY" to your providers API key (if you are using external LLMs)

API_KEY=your-api-key-here

Or you can pass it directly during class initialization.

Faithfulness(provider="openai", api_key="your-api-key-here")

Usage

All metrics are exposed on upper level. Therefore they can be imported as such:

from ragret import (
  AnswerRelevancy
  Faithfulness
  ContextPrecision
  ContextRecall
  CosineSimilarity
  F1Score
  ProductRelevancy
)

Usage may vary depending on what you want to do. There are two ways to evaluate your dataset.

Use case 1 (Common - Simpler - Faster): Use the Evaluator class

# Import the metrics we want to use to evaluate our dataset, evaluator, and example dataset
from ragret import ContextRecall, ContextPrecision, AnswerRelevancy
from ragret.evaluators import Evaluator
from ragret.datasets import example_dataset
import pandas as pd

# Initialize metric classes with the desired provider.
# Other optional parameters:
# - api_key: provide your API key directly
# - ollama_url: for local LLM models
# - model: the LLM model name
# - embedding_model: the embedding model to use
context_recall = ContextRecall(provider="openai")
context_precision = ContextPrecision(provider="openai")
answer_relevancy = AnswerRelevancy(provider="openai")

# Create the evaluator with the dataset
# Use the calculate() method to evaluate the dataset with the selected metrics
results = Evaluator(example_dataset).calculate(
  context_recall,
  answer_relevancy,
  context_precision
  )

# Finally convert the results into a DataFrame and save the output in the current working directory.
df = pd.DataFrame(results)
df.to_csv("evaluation_results.csv", index=False)
print("Results saved to evaluation_results.csv")

With the help of pandas, we convert our results into a DataFrame and save the output in the current working directory.

Note: It’s important that the dataset is structured like the example below so that the Evaluator class can work correctly and produce results.

[
    {
        "user_query": "User Question 1",
        "retrieved_documents": ["Retrieved document text 1", 
                                "Retrieved document text 2", 
                                "Retrieved document text 3"],
        "llm_answer": "LLM Answer for Question 1"
    },
    {
        "user_query": "User Question 2",
        "retrieved_documents": ["Retrieved document text 1", 
                                "Retrieved document text 2", 
                                "Retrieved document text 3"],
        "llm_answer": "LLM Answer for Question 2"
    },...
]

Use case 2 : Use the metrics classes directly

# Import the metrics we want to use to evaluate our dataset, evaluator, and example dataset
from ragret import ContextRecall, ContextPrecision, AnswerRelevancy
from ragret.datasets import example_dataset
import pandas as pd

# Initialize metric classes with the desired provider.
# Other optional parameters:
# - api_key: provide your API key directly
# - ollama_url: for local LLM models
# - model: the LLM model name
# - embedding_model: the embedding model to use
context_recall = ContextRecall(provider="openai")
context_precision = ContextPrecision(provider="openai")
answer_relevancy = AnswerRelevancy(provider="openai")

results = []
# Use a simple for loop to evaluate the dataset with the selected metrics
for i,record in enumerate(example_dataset):
    print(f"Evaluating record {i+1} of {len(example_dataset)}")

    recall_result = context_recall.score(
        retrieved_documents=record["retrieved_documents"],
        llm_answer=record["llm_answer"]
    )["score"]
    
    precision_result = context_precision.score(
        user_query=record["user_query"],
        retrieved_documents=record["retrieved_documents"]
    )["score"]
    
    answer_relevancy_result = answer_relevancy.score(
        user_query=record["user_query"],
        llm_answer=record["llm_answer"])["score"]

    results.append({
        "context_precision": precision_result,
        "context_recall": recall_result,
        "answer_relevancy":answer_relevancy_result
    })

# Finally convert the results into a DataFrame and save the output in the current working directory.
df = pd.DataFrame(results)
df.to_csv("evaluation_results.csv", index=False)
print("Results saved to evaluation_results.csv")

Contact

If you encounter any issues or bugs with the application, feel free to reach out to me:

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragret-0.1.0.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragret-0.1.0-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file ragret-0.1.0.tar.gz.

File metadata

  • Download URL: ragret-0.1.0.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for ragret-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a29e2b32c14fb0c0c410264aedb2b09cb39859d7933409f8b472a1209bafa167
MD5 5c8d8c97d0d28b32ee7102461d27f5e3
BLAKE2b-256 e03edd85c8c637a8bc939aaac22a3b267d9020e1d101ca04812df17ebf1cf9bd

See more details on using hashes here.

File details

Details for the file ragret-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ragret-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for ragret-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b71083e60edb291b678f45a734fbf4117f9f53ea12ab96db45d08d7609e42326
MD5 cc416bc0cee26a80b62dbb0aa709ff1d
BLAKE2b-256 2a51206faac72806325f9a6580511872f98a0d2c2413877e0ab4a72b517d65f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page