Skip to main content

A framework for retrieval augmented generation evaluation (RAGE)

Project description

RAGE - Retrieval Augmented Generation Evaluation

TL;DR

RAGE is a tool for evaluating how well Large Language Models (LLMs) cite relevant sources in Retrieval Augmented Generation (RAG) tasks.

What am I looking at?

RAGE is a framework designed to evaluate Large Language Models (LLMs) regarding their suitability for Retrieval Augmented Generation (RAG) applications. In RAG settings, LLMs are augmented with documents that are relevant to a given search query. The key element evaluated is the ability of an LLM to cite the sources it used for answer generation.

The main idea is to present the LLM with a query and with relevant, irrelevant, and seemingly relevant documents. Seemingly relevant documents are from the same area as the relevant documents but don't contain the actual answer. RAGE then measures how well the LLM recognized the relevant documents.

Rage Evaluation Process Figure 1: RAGE Evaluation Process. Examples are extracted from the Natural Questions Dataset.

For a more detailed description of the inner workings, dataset creation and metrics, we refer to our paper:
Evaluating and Fine-Tuning Retrieval-Augmented Language Models to Generate Text With Accurate Citations

Installation

Pip:

pip install rage-toolkit

Build from source:

$ git clone https://github.com/othr-nlp/rage_toolkit.git
$ cd rage_toolkit
$ pip install -e .

Get Started

We recommend starting at the rage_getting_started.ipynb Jupyter Notebook. It gives you a quick introduction into how to set up and run an evaluation with a custom LLM.

Datasets

Note that RAGE works with any datasets that comply with our format. Feel free to create your own datasets that suit your needs.

For guidance on creating one, take a look at our preprocessed examples or refer to our paper.

Our datasets are built on top of those from the BEIR Benchmark (BEIR Benchmark).

Our preprocessed datasets can be found here:

Original Dataset Website RAGE version on Huggingface
Natural Questions (NQ) https://ai.google.com/research/NaturalQuestions RAGE - NQ
HotpotQA https://hotpotqa.github.io/ RAGE - HotpotQA

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests for improvements.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rage_toolkit-0.0.1.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

rage_toolkit-0.0.1-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file rage_toolkit-0.0.1.tar.gz.

File metadata

  • Download URL: rage_toolkit-0.0.1.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.6

File hashes

Hashes for rage_toolkit-0.0.1.tar.gz
Algorithm Hash digest
SHA256 58014162aeeeab8b301f53d8ba7ec8dbef12d0d9d02d555ff806b750ad389c4b
MD5 2c259a7c1096b59d0059e974ae429332
BLAKE2b-256 2dc93d80ff10b1437f4274c90b36ddfd74059c618d6b2dcb7469ba7c0db4f528

See more details on using hashes here.

File details

Details for the file rage_toolkit-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for rage_toolkit-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 20fb1752f8c0155ba30f314cab032c85a3d20cd5805419f195a234c192aabe0b
MD5 00d3601c627afca727d21659b8f54ce0
BLAKE2b-256 48db56e349cc0bd83fe8f3eaba63f27f0ed9758f1e23e37119fa84d3c869bc3a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page