ragelo

RAGElo: A Tool for Evaluating Retrieval-Augmented Generation Models

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Elo-based RAG Agent evaluator

RAGElo[^1] is a streamlined toolkit for evaluating Retrieval Augmented Generation (RAG)-powered Large Language Models (LLMs) question answering agents using the Elo rating system.

While it has become easier to prototype and incorporate generative LLMs in production, evaluation is still the most challenging part of the solution. Comparing different outputs from multiple prompt and pipeline variations to a "gold standard" is not easy. Still, we can ask a powerful LLM to judge between pairs of answers and a set of questions.

This led us to develop a simple tool for tournament-style Elo ranking of LLM outputs. By comparing answers from different RAG pipelines and prompts over multiple questions, RAGElo computes a ranking of the different settings, providing a good overview of what works (and what doesn't).

The RAGElo tool finds its origin in a Generative AI project that Zeta Alpha developed with BASF.

⚙️ Installation

pip install ragelo

If you want to use RAGElo as a standalone CLI app, use the [cli] tag:

pip install ragelo[cli]

🚀 Quickstart

After installing RAGElo as a CLI app, you can run it with the following command:

ragelo run-all queries.csv documents.csv answers.csv

---------- Agent Scores by Elo ranking ----------
 agent1        : 1026.7
 agent2        : 973.3

We need three files for running an end-to-end evaluation: queries.csv, documents.csv, and answers.csv:

queries.csv:

query_id,query
0, "What is the capital of Brazil?"
1, "What is the capital of France?"

documents.csv:

query_id,doc_id,document_text
0,0, "Brasília is the capital of Brazil."
0,1, "Rio de Janeiro used to be the capital of Brazil."
1,2, "Paris is the capital of France."
1,3, "Lyon is the second largest city in France."

answers.csv:

query_id,agent,answer
0, agent1, "Brasília is the capital of Brazil, according to [0]."
0, agent2, "According to [1], Rio de Janeiro used to be the capital of Brazil until 1960."
1, agent1, "Paris is the capital of France, according to [2]."
1, agent2, "Lyon is the second largest city in France, according to [3]."

The OpenAI API key should be set as an environment variable (OPENAI_API_KEY). Alternatively, you can set a credentials file and pass it as an option to ragelo:

credentials.txt:

OPENAI_API_KEY=<your_key_here>

ragelo --credentials credentials.txt run-all queries.csv documents.csv answers.csv

🧩 Components

While RAGElo is meant to be used as an end-to-end tool, we can also invoke each of its components individually:

📜 `documents-annotator`

The documents-annotator tool annotates documents based on their relevance to the user query. This is done regardless of the answers provided by the Agents. By default, it uses the reasoner annotator, which only outputs the reasoning for the relevance judgment:

ragelo documents-annotator queries.csv documents.csv reasonings.csv

The reasonings.csv output file is a csv file with query_id, document_id and answer columns: tests/data/reasonings.csv.

💬 `answers-annotator`

The answers-annotator tool annotates the answers generated by the Agents, taking the quality of the documents retrieved by the retrieval pipeline. By default, it uses the PairwiseWithReasoning annotator, which generates k random pairs of answers for each query and chooses the best answer based on the relevant documents cited in the answer. It relies on the reasonings.csv file generated by the documents-annotator:

ragelo answers-annotator queries.csv answers.csv reasonings.csv answers_eval.jsonl

The answers_eval.jsonl output file is a JSONL file with each line containing the prompt for evaluating the pair of answers, the output of the annotator, and the best answer. An output file example is provided at tests/data/answers_eval.jsonl

🏆 `agents-ranker`

Finally, the agents-ranker tool ranks the agents by simulating an Elo tournament where the output of each game is given by the answers from the answers-annotator:

ragelo agents-ranker answers_eval.jsonl agents_ranking.csv

The output of this step is written to the output file agents_ranking.csv with columns agent and score: tests/data/agents_ranking.csv.

🙋 Contributing

To install the development dependencies, download the repository and run the following:

git clone https://github.com/zeta-alpha/ragelo && cd ragelo
pip install -e '.[dev]'

This will install the requirement dependencies in an editable mode (i.e., any changes to the code don't need to be rebuilt.) For building a new version, use the build command:

python -m build

✅ TODO

Add option to few-shot examples
Publish on PyPi
Add custom types
Testing!
Add CI/CD for publishing
Add more document evaluators (Microsoft)
Split Elo evaluator
Install as standalone CLI

[^1]: The RAGElo logo was created using Dall-E 3 and GPT-4 with the following prompt: "Vector logo design for a toolkit named 'RAGElo'. The logo should have bold, modern typography with emphasis on 'RAG' in a contrasting color. Include a minimalist icon symbolizing retrieval or ranking."

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.5

May 31, 2024

0.1.4

May 31, 2024

0.1.3

May 31, 2024

0.1.2

May 31, 2024

0.1.1

Apr 16, 2024

0.0.5

Feb 15, 2024

0.0.4

Feb 15, 2024

0.0.3

Oct 25, 2023

This version

0.0.2

Oct 23, 2023

0.0.1

Oct 23, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragelo-0.0.2.tar.gz (24.0 kB view hashes)

Uploaded Oct 23, 2023 Source

Built Distribution

ragelo-0.0.2-py3-none-any.whl (26.1 kB view hashes)

Uploaded Oct 23, 2023 Python 3

Hashes for ragelo-0.0.2.tar.gz

Hashes for ragelo-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`f9589a21775ba5d71fb2ddf201c1f99f5cdda33df02c0c3a8bfedd366e77e0b2`
MD5	`b9134b0593b1d91fda3b5d81dbd388ae`
BLAKE2b-256	`dabf7251f6a0d4b7605903864b8f4adf0ca8c46414b19e5dd523187432908811`

Hashes for ragelo-0.0.2-py3-none-any.whl

Hashes for ragelo-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`304b566a6ac43611d75649a8f023ca7030c19e65e1c3aaabf38a41b0f95a51bb`
MD5	`2ffe3d3832a89b5a452bb9858f586d98`
BLAKE2b-256	`16cd09d6bc2ff13a71f7acbbfc6e360194f2e477f7032169e862d0fccdf613e1`

ragelo 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

⚙️ Installation

🚀 Quickstart

🧩 Components

📜 `documents-annotator`

💬 `answers-annotator`

🏆 `agents-ranker`

🙋 Contributing

✅ TODO

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

ragelo 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

⚙️ Installation

🚀 Quickstart

🧩 Components

📜 documents-annotator

💬 answers-annotator

🏆 agents-ranker

🙋 Contributing

✅ TODO

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

📜 `documents-annotator`

💬 `answers-annotator`

🏆 `agents-ranker`