Skip to main content

No project description provided

Project description

<h1 align="center">
<img style="vertical-align:middle" height="200"
src="./docs/assets/logo.png">
</h1>
<p align="center">
<i>SOTA metrics for evaluating Retrieval Augmented Generation (RAG)</i>
</p>

<p align="center">
<a href="https://github.com/explodinggradients/ragas/releases">
<img alt="GitHub release" src="https://img.shields.io/github/release/explodinggradients/ragas.svg">
</a>
<a href="https://www.python.org/">
<img alt="Build" src="https://img.shields.io/badge/Made%20with-Python-1f425f.svg?color=purple">
</a>
<a href="https://github.com/explodinggradients/ragas/blob/master/LICENSE">
<img alt="License" src="https://img.shields.io/github/license/explodinggradients/ragas.svg?color=green">
</a>
<a href="https://colab.research.google.com/drive/1HfutiEhHMJLXiWGT8pcipxT5L2TpYEdt?usp=sharing">
<img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg">
</a>
<a href="https://github.com/explodinggradients/ragas/">
<img alt="Downloads" src="https://badges.frapsoft.com/os/v1/open-source.svg?v=103">
</a>
</p>

<h4 align="center">
<p>
<a href="#shield-installation">Installation</a> |
<a href="#fire-quickstart">Quickstart</a> |
<a href="#luggage-metrics">Metrics</a> |
<a href="#raising_hand_man-faq">FAQ</a> |
<a href="https://huggingface.co/explodinggradients">Hugging Face</a>
<p>
</h4>

ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard.. This is were ragas (RAG Assessment) comes in

ragas provides you with the tools based on the latest research for evaluating LLM generated text to give you insights about your RAG pipeline. ragas can be integrated with your CI/CD to provide continuous check to ensure performance.

## :shield: Installation

```bash
pip install ragas
```
if you want to install from source
```bash
git clone https://github.com/explodinggradients/ragas && cd ragas
pip install -e .
```

## :fire: Quickstart

This is a small example program you can run to see ragas in action!
```python

from ragas.metrics import factuality, answer_relevancy, context_relevancy
from ragas import evaluate
import os

os.environ["OPENAI_API_KEY"] = "your-openai-key"

ds = Dataset({
features: ['question','context','answer'],
num_rows: 25
})
results = evaluate(ds, metrics=[nli, answer_relevancy, context_relevancy])

```
If you want a more in-depth explanation of core components, check out our quick-start notebook
## :luggage: Metrics

Ragas measures your pipeline's performance against two dimensions
1. **Factuality**: measures the factual consistency of the generated answer against the given context.
2. **Relevancy**: measures how relevant retrieved contexts and the generated answer are to the question.

Through repeated experiments, we have found that the quality of a RAG pipeline is highly dependent on these two dimensions. The final `ragas_score` is the harmonic mean of these two factors.

To read more about our metrics, checkout [docs](/docs/metrics.md).
## :question: How to use Ragas to improve your pipeline?
*"Measurement is the first step that leads to control and eventually to improvement" - James Harrington*

Here we assume that you already have your RAG pipeline ready. When is comes to RAG pipelines, there are mainly two parts - Retriever and generator. A change in any of this should also impact your pipelines's quality.

1. First, decide one parameter that you're interested in adjusting. for example the number of retrieved documents, K.
2. Collect a set of sample prompts (min 20) to form your test set.
3. Run your pipeline using the test set before and after the change. Each time record the prompts with context and generated output.
4. Run ragas evaluation for each of them to generate evaluation scores.
5. Compare the scores and you will know how much the change has affected your pipelines's performance.


## :raising_hand_man: FAQ
1. Why harmonic mean?
Harmonic mean penalizes extreme values. For example if your generated answer is fully factually consistent with the context (factuality = 1) but is not relevant to the question (relevancy = 0), simple average would give you a score of 0.5 but harmonic mean will give you 0.0







Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragas-0.0.3rc1.tar.gz (83.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragas-0.0.3rc1-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file ragas-0.0.3rc1.tar.gz.

File metadata

  • Download URL: ragas-0.0.3rc1.tar.gz
  • Upload date:
  • Size: 83.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for ragas-0.0.3rc1.tar.gz
Algorithm Hash digest
SHA256 839feda85c6611f17662f4e1075f0d2e7dc2c8ade7c8b3dd51b5ec0cfaa16b42
MD5 9a8c90811d86910fa47acbad82b514b8
BLAKE2b-256 75469d4af42ff6c45a26e57a59323dc2404bb2ba5b782b11c5b724d12f1d2113

See more details on using hashes here.

File details

Details for the file ragas-0.0.3rc1-py3-none-any.whl.

File metadata

  • Download URL: ragas-0.0.3rc1-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for ragas-0.0.3rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 4370a74992f243a829d56703362b63f467e93b560d14cf5814b6affb15eeeeb7
MD5 004ca048deef3a9b76aeaba9374df80e
BLAKE2b-256 e67ffd7e843dfc7f311003a5a6a66221764dd4030a06d6e7f1b7a1b19fb4631c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page