A framework for retrieval augmented generation evaluation (RAGE)
Project description
RAGE - Retrieval Augmented Generation Evaluation
TL;DR
RAGE is a tool for evaluating how well Large Language Models (LLMs) cite relevant sources in Retrieval Augmented Generation (RAG) tasks.
What am I looking at?
RAGE is a framework designed to evaluate Large Language Models (LLMs) regarding their suitability for Retrieval Augmented Generation (RAG) applications. In RAG settings, LLMs are augmented with documents that are relevant to a given search query. The key element evaluated is the ability of an LLM to cite the sources it used for answer generation.
The main idea is to present the LLM with a query and with relevant, irrelevant, and seemingly relevant documents. Seemingly relevant documents are from the same area as the relevant documents but don't contain the actual answer. RAGE then measures how well the LLM recognized the relevant documents.
Figure 1: RAGE Evaluation Process. Examples are extracted from the Natural Questions Dataset.
For a more detailed description of the inner workings, dataset creation and metrics, we refer to our paper:
→ Evaluating and Fine-Tuning Retrieval-Augmented Language Models to Generate Text With Accurate Citations
Installation
Pip:
pip install rage-toolkit
Build from source:
$ git clone https://github.com/othr-nlp/rage_toolkit.git
$ cd rage_toolkit
$ pip install -e .
Get Started
We recommend starting at the rage_getting_started.ipynb
Jupyter Notebook.
It gives you a quick introduction into how to set up and run an evaluation with a custom LLM.
Datasets
Note that RAGE works with any datasets that comply with our format. Feel free to create your own datasets that suit your needs.
For guidance on creating one, take a look at our preprocessed examples or refer to our paper.
Our datasets are built on top of those from the BEIR Benchmark (BEIR Benchmark).
Our preprocessed datasets can be found here:
Original Dataset | Website | RAGE version on Huggingface |
---|---|---|
Natural Questions (NQ) | https://ai.google.com/research/NaturalQuestions | RAGE - NQ |
HotpotQA | https://hotpotqa.github.io/ | RAGE - HotpotQA |
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Feel free to open issues or submit pull requests for improvements.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file rage_toolkit-0.0.1.tar.gz
.
File metadata
- Download URL: rage_toolkit-0.0.1.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58014162aeeeab8b301f53d8ba7ec8dbef12d0d9d02d555ff806b750ad389c4b |
|
MD5 | 2c259a7c1096b59d0059e974ae429332 |
|
BLAKE2b-256 | 2dc93d80ff10b1437f4274c90b36ddfd74059c618d6b2dcb7469ba7c0db4f528 |
File details
Details for the file rage_toolkit-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: rage_toolkit-0.0.1-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20fb1752f8c0155ba30f314cab032c85a3d20cd5805419f195a234c192aabe0b |
|
MD5 | 00d3601c627afca727d21659b8f54ce0 |
|
BLAKE2b-256 | 48db56e349cc0bd83fe8f3eaba63f27f0ed9758f1e23e37119fa84d3c869bc3a |