A tool for evaluating RAG pipelines

These details have not been verified by PyPI

Project links

Project description

RAGulate

A tool for evaluating RAG pipelines

ragulate_logo

The Metrics

The RAGulate currently reports 4 relevancy metrics: Answer Correctness, Answer Relevance, Context Relevance, and Groundedness.

metrics_diagram

Answer Correctness
- How well does the generated answer match the ground-truth answer?
- This confirms how well the full system performed.
Answer Relevance
- Is the generated answer relevant to the query?
- This shows if the LLM is responding in a way that is helpful to answer the query.
Context Relevance:
- Does the retrieved context contain information to answer the query?
- This shows how well the retrieval part of the process is performing.
Groundedness:
- Is the generated response supported by the context?
- Low scores here indicate that the LLM is hallucinating.

Example Output

The tool outputs results as images like this:

example_output

These images show distribution box plots of the metrics for different test runs.

Installation

pip install ragstack-ai-ragulate

Initial Setup

Set your environment variables or create a .env file. You will need to set OPENAI_API_KEY and any other environment variables needed by your ingest and query pipelines.
Wrap your ingest pipeline in a single python method. The method should take a file_path parameter and any other variables that you will pass during your experimentation. The method should ingest the passed file into your vector store.

See the ingest() method in open_ai_chunk_size_and_k.py as an example. This method configures an ingest pipeline using the parameter chunk_size and ingests the file passed.
Wrap your query pipeline in a single python method, and return it. The method should have parameters for any variables that you will pass during your experimentation. Currently only LangChain LCEL query pipelines are supported.

See the query() method in open_ai_chunk_size_and_k.py as an example. This method returns a LangChain LCEL pipeline configured by the parameters chunk_size and k.

Note: It is helpful to have a **kwargs param in your pipeline method definitions, so that if extra params are passed, they can be safely ignored.

Usage

Summary

usage: ragulate [-h] {download,ingest,query,compare} ...

RAGu-late CLI tool.

options:
  -h, --help            show this help message and exit

commands:
    download            Download a dataset
    ingest              Run an ingest pipeline
    query               Run an query pipeline
    compare             Compare results from 2 (or more) recipes
    run                 Run an experiment from a config file

Example

For the examples below, we will use the example experiment open_ai_chunk_size_and_k.py and see how the RAG metrics change for changes in chunk_size and k (number of documents retrieved).

There are two ways to run Ragulate to run an experiment. Either define an experiment with a config file or execute it manually step by step.

Via Config File

Note: Running via config file is a new feature and it is not as stable as running manually.

Create a yaml config file with a similar format to the example config: example_config.yaml. This defines the same test as shown manually below.
Execute it with a single command:
```
ragulate run example_config.yaml
```
This will:
- Download the test datasets
- Run the ingest pipelines
- Run the query pipelines
- Output an analysis of the results.

Manually

Download a dataset. See available datasets here: https://llamahub.ai/?tab=llama_datasets

If you are unsure where to start, recommended datasets are:
- BraintrustCodaHelpDesk
- BlockchainSolana
Examples:
- ragulate download -k llama BraintrustCodaHelpDesk
- ragulate download -k llama BlockchainSolana

Ingest the datasets using different methods:

Examples:

Ingest with chunk_size=200:

ragulate ingest -n chunk_size_200 -s open_ai_chunk_size_and_k.py -m ingest \
--var-name chunk_size --var-value 200 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana

Ingest with chunk_size=100:

ragulate ingest -n chunk_size_100 -s open_ai_chunk_size_and_k.py -m ingest \
--var-name chunk_size --var-value 100 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana

Run query and evaluations on the datasets using methods:

Examples:

Query with chunk_size=200 and k=2

ragulate query -n chunk_size_200_k_2 -s open_ai_chunk_size_and_k.py -m query_pipeline \
--var-name chunk_size --var-value 200  --var-name k --var-value 2 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana

Query with chunk_size=100 and k=2

ragulate query -n chunk_size_100_k_2 -s open_ai_chunk_size_and_k.py -m query_pipeline \
--var-name chunk_size --var-value 100  --var-name k --var-value 2 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana

Query with chunk_size=200 and k=5

ragulate query -n chunk_size_200_k_5 -s open_ai_chunk_size_and_k.py -m query_pipeline \
--var-name chunk_size --var-value 200  --var-name k --var-value 5 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana

Query with chunk_size=100 and k=5

ragulate query -n chunk_size_100_k_5 -s open_ai_chunk_size_and_k.py -m query_pipeline \
--var-name chunk_size --var-value 100  --var-name k --var-value 5 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana

Run a compare to get the results:

Example:

ragulate compare -r chunk_size_100_k_2 -r chunk_size_200_k_2 -r chunk_size_100_k_5 -r chunk_size_200_k_5

This will output 2 png files. one for each dataset.

Current Limitations

Only LangChain query pipelines are supported
There is no way to specify which metrics to evaluate.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.14rc3 pre-release

Aug 12, 2024

0.0.14rc2 pre-release

Aug 1, 2024

0.0.14rc1 pre-release

Jul 9, 2024

0.0.13

Jun 28, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragstack_ai_ragulate-0.0.14rc3.tar.gz (21.8 kB view details)

Uploaded Aug 12, 2024 Source

Built Distribution

ragstack_ai_ragulate-0.0.14rc3-py3-none-any.whl (30.4 kB view details)

Uploaded Aug 12, 2024 Python 3

File details

Details for the file ragstack_ai_ragulate-0.0.14rc3.tar.gz.

File metadata

Download URL: ragstack_ai_ragulate-0.0.14rc3.tar.gz
Upload date: Aug 12, 2024
Size: 21.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for ragstack_ai_ragulate-0.0.14rc3.tar.gz
Algorithm	Hash digest
SHA256	`383c72732254579030b3a2ee406359ae7fe797fc5cf12acb8c25a8c1786ecd4a`
MD5	`c0b923df4a93145971e89409a7fd680d`
BLAKE2b-256	`46657e7045a0bd26dca586d2b0a11a316627c3e2c6fd6ebc76cf4c4bf081fc2d`

See more details on using hashes here.

File details

Details for the file ragstack_ai_ragulate-0.0.14rc3-py3-none-any.whl.

File metadata

Download URL: ragstack_ai_ragulate-0.0.14rc3-py3-none-any.whl
Upload date: Aug 12, 2024
Size: 30.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for ragstack_ai_ragulate-0.0.14rc3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2b5eeb50be00d394d09a258cc2af400b931c8b3e7b5873b48aaedae4702fc8f6`
MD5	`d48d44022bed121b69acdd2899d4d63a`
BLAKE2b-256	`31bc4be7377f202ff62d401fcb797bd2df8dbdff05bb27a8adac4516f2a62858`