A library for generating dataset and evaluating these datasets on RAG based solutions
Project description
RAGFORMance is a library for generating benchmarks for Retrieval Augmented Generation systems.
RAGFORMance wraps multiple question/answers dataset generators, such as RAGAS, DeepEval or Your-Bench, as well as proposing different types of generators relevant for testing industrial use cases. Some generators are using LLMs, some are relying on custom logic.
RAGFORMancealso wraps multiple connectors to well known RAG system to be testes, such as OpenWebUI, Haystack, Ragflow, or custom developments on langchain or llama index.
Finally, RAGFORMance offers different metrics by wrapping to state of the art libraries such as TrecEval, LLM metrics from RAGAS or DeepEval, and proposing custom metrics and visualization that are relevant for different types of RAG systems.
Installation
Install the library using pip: pip install ragformance or pip install ragformance[all] to install all the generators, RAG wrappers and metrics (including wrappers to RAGAS and DeepEval)
Usage
Usage as a library
The library contains 4 types of functions :
- Generators that take document as inputs and generates different types of evaluation datasets
- Data Loaders that convert well known dataset formats from and to the RAGFORmance format
- RAG wrappers that automatically runs the evaluations on a given RAG chain
- Metrics that evaluates both the Retrieval capabilities and the end answer
Complete exemples can be found in the documentation, here is a code snippet that should run after installation.
from ragformance.dataloaders import load_beir_dataset
from ragformance.rag.naive_rag import NaiveRag
from ragformance.rag.config import NaiveRagConfig
corpus, queries = load_beir_dataset(filter_corpus = True)
config = NaiveRagConfig(EMBEDDING_MODEL = "all-MiniLM-L6-v2")
naive_rag = NaiveRag(config)
doc_uploaded_num = naive_rag.upload_corpus(corpus)
answers = naive_rag.ask_queries(queries)
from ragformance.eval import trec_eval_metrics
from ragformance.eval import visualize_semantic_F1, display_semantic_quadrants
metrics = trec_eval_metrics(answers)
quadrants = visualize_semantic_F1(corpus, answers, embedding_config={"model": "all-MiniLM-L6-v2"})
display_semantic_quadrants(quadrants)
Usage as a CLI or python pipeline
The second way to use RAGformance is as a standalone programme or executable, through the command-line interface (CLI) with a configuration file. After installation with pip or through the pre-compiled libraries available on github, you can run the following command :
ragformance --config your_config.json
This corresponds to the following python code :
from ragformance.cli.run import run_pipeline
corpus, queries, answers, metrics_data, display_widget = run_pipeline("config.json")
Configuring the pipeline
Data generation and pipelines in particular are controlled via the generation section within your your_config.json file. Here is an exemple that reproduces the same execution as above : loading the BEIR dataset, testing on Naive_RAG and generating metrics and visualizations
Example config.json snippet for data generation:
{
"generation": {
"type": "alpha",
"source": {
"path": "path/to/your/input_data"
},
"output": {
"path": "path/to/your/output_folder"
},
"params": {}
},
"dataset": {
"source_type": "beir",
"path": "scifact",
"filter_corpus": true
},
"data_path": "data",
"rag": {
"rag_type": "naive",
"params": {
"EMBEDDING_MODEL": "all-MiniLM-L6-v2"
}
},
"steps": {
"generation": false,
"upload_hf": false,
"load_dataset": true,
"evaluation": true,
"metrics": true,
"visualization": true
}
}
Generate Data
Generators each have specific configuration parameters. They usually take a folder as input and produce a folder as output with the jsonl dataset. Here is an exemple with a generator that does not require an LLM backend :
from forcolate import convert_URLS_to_markdown
query = "Download and convert https://fr.wikipedia.org/wiki/Grand_mod%C3%A8le_de_langage and https://fr.wikipedia.org/wiki/Ascenseur_spatial"
convert_URLS_to_markdown(query, "", "data/wikipedia")
from ragformance.generators.structural_generator import StructuralGenerator, StructuralGeneratorConfig
config = StructuralGeneratorConfig(
data_path = "data/wikipedia",
output_path = "data/wikipedia_questions")
corpus, queries = StructuralGenerator().run(config)
This can also be run in a full pipeline (cli or library) with the following config file and python command:
{
"generation": {
"type": "structural_generator",
"source": {
"path": "data/wikipedia"
},
"output": {
"path": "data/wikipedia_questions"
},
"params": {}
},
"data_path": "data",
"steps": {
"generation": true
}
}
from forcolate import convert_URLS_to_markdown
query = "Download and convert https://fr.wikipedia.org/wiki/Grand_mod%C3%A8le_de_langage and https://fr.wikipedia.org/wiki/Ascenseur_spatial"
convert_URLS_to_markdown(query, "", "data/wikipedia")
from ragformance.cli.run import run_pipeline
corpus, queries, answers, metrics_data, display_widget = run_pipeline("config.json")
For detailed information on available generators, their specific parameters, and advanced configuration, please refer to the Generators Documentation.
Dataset Structure
The dataset consists of two files:
corpus.jsonl: A jsonl file containing the corpus of documents. Each document is represented as a json object with the following fields:_id: The id of the document.title: The title of the document.text: The text of the document.
queries.jsonl: A jsonl file containing the queries. Each query is represented as a json object with the following fields:_id: The id of the query.query_text: The text of the query.relevant_document_ids: A list of references to the documents in the corpus. Each reference is represented as a json object with the following fields:corpus_id: The id of the document.score: The score of the reference.
ref_answer: The reference answer for the query.metadata: A dictionary containing the metadata for the query.
This structure is inspired by the popular BEIR format, with the inclusion of the qrelsfile inside the queries : indeed, BEIR is optimized for Information Retrieval tasks whereas this library aims also to evaluates other tasks (such as end to end generation).
Answer Output Format
The answers generated by the system are structured as a json lines, with each line corresponding to a processed question. Each entry contains:
query: A dictionary describing the original question, with:_id: Unique identifier for the question.query_text: The question text.relevant_document_ids: A list of corpus documents considered as references for this question, each reference containing:corpus_id: The document identifier.score: The importance or relevance score.
ref_answer: The reference (gold standard) answer for the question.
model_answer: The generated answerrelevant_documents_ids: A list of corpus document IDs used as context for generating the answer.retrieved_documents_distances: A list of relevancy scores for the retrieved documents.
It is based on the following pydantic model
from typing import Dict, List, Optional
from pydantic import BaseModel, Field
class RelevantDocumentModel(BaseModel):
corpus_id: str
score: int
class AnnotatedQueryModel(BaseModel):
id: str = Field(alias="_id")
query_text: str
relevant_document_ids: List[RelevantDocumentModel]
ref_answer: str
metadata: Optional[Dict] = None
class AnswerModel(BaseModel):
id: str = Field(alias="_id")
query: AnnotatedQueryModel
# model output
model_answer: str
retrieved_documents_ids: List[str]
retrieved_documents_distances: Optional[List[float]] = None
Loading a dataset from jsonl
from typing import List
from ragformance.models.corpus import DocModel
from ragformance.rag.naive_rag import NaiveRag
from pydantic import TypeAdapter
ta = TypeAdapter(List[DocModel])
# load from jsonl file
with open("output/corpus.jsonl","r") as f:
corpus= ta.validate_python([json.loads(line) for line in f])
naive_rag = NaiveRag()
naive_rag.upload_corpus(corpus=corpus)
Additionnal features
To keep the core library lightweight, but still allow multiple integrations, all the features below are packaged optionnaly in the library. You must install them with a specific command or generally :
pip install ragformance[all]
Loading a dataset from Hugging face
You can use directly datasets with the correct format that are hosted on Hugging Face. First install optionnal dependencies with :
pip install ragformance[huggingface]
from typing import List
from ragformance.models.corpus import DocModel
from ragformance.models.answer import AnnotatedQueryModel
from ragformance.rag.naive_rag import NaiveRag
from pydantic import TypeAdapter
from datasets import load_dataset
ta = TypeAdapter(List[DocModel])
taq = TypeAdapter(List[AnnotatedQueryModel])
corpus= ta.validate_python(load_dataset("FOR-sight-ai/ragformance_toloxa", "corpus", split="train"))
queries = taq.validate_python(load_dataset("FOR-sight-ai/ragformance_toloxa", "queries", split="train"))
naive_rag = NaiveRag()
doc_uploaded_num = naive_rag.upload_corpus(corpus=corpus)
answers = naive_rag.ask_queries(queries)
Pushing dataset to Hugging Face Hub
This function pushes the two jsonl files to a Hugging Face Hub dataset repository; you must set the environment variable HF_TOKEN, either in system environment or config.json
from ragformance.dataloaders import push_to_hub
HFpath = "YOUR_NAME/YOUR_PATH"
push_to_hub(HFpath, "output")
Trec-Eval Metrics and visualization
This library wraps the trec eval tools for Information Retrieval metrics. Make sure to install the optional dependency with:
pip install ragformance[trec-eval]
It provides also a set metrics visualization to help assess if the test dataset is well balanced and if a solution under test has the expected performances.
from ragformance.eval import trec_eval_metrics
from ragformance.eval import visualize_semantic_F1, display_semantic_quadrants
metrics = trec_eval_metrics(answers)
quadrants = visualize_semantic_F1(corpus, answers)
display_semantic_quadrants(quadrants)
Using DeepEval metrics
You can also use DeepEval metrics for LLM-based evaluation. Make sure to install the optional dependency with:
pip install ragformance[deepeval]
Example usage:
from ragformance.eval import compute_deepeval_metrics
additional_metrics = {
"FaithfulnessMetric": True
}
metric = compute_deepeval_metrics(
corpus,
answers,
llm_api_key="your API key",
additional_metrics=additional_metrics
)
print(metric)
Tracing
To enable tracing with Arize Phoenix, you need to install the optional Phoenix dependencies:
pip install ragformance[phoenix]
Then add the following section to your config.json file (all parameters are optional except enable; defaults are shown):
{
"phoenix": {
"enable": true,
"endpoint": "http://localhost:6006", // Phoenix server address (optional)
"project_name": "ragformance" // Project name for tracing (optional)
},
...
}
When enabled, RAGformance will automatically instrument the generation pipelines and send traces to the Phoenix server specified in endpoint (default: http://localhost:6006). You can start the Phoenix UI with:
phoenix serve
Then open http://localhost:6006/ (or your custom endpoint) in your browser to view traces and metrics.
For more details, see the Phoenix documentation.
Acknowledgement
This project received funding from the French ”IA Cluster” program within the Artificial and Natural Intelligence Toulouse Institute (ANITI) and from the "France 2030" program within IRT Saint Exupery. The authors gratefully acknowledge the support of the FOR projects.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ragformance-0.1.105.tar.gz.
File metadata
- Download URL: ragformance-0.1.105.tar.gz
- Upload date:
- Size: 104.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d0341af4f2fbafaf9803b3ad6090acaa9a7d40be5133d8c57225d168ea0bef3
|
|
| MD5 |
c0c2289a53158d7958e651791551bd77
|
|
| BLAKE2b-256 |
20d6e643df7ddd950222ac897ce953f769197c5e36b8a9adab4b42dc56942e75
|
Provenance
The following attestation bundles were made for ragformance-0.1.105.tar.gz:
Publisher:
publish.yml on FOR-sight-ai/RAGFORmance
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ragformance-0.1.105.tar.gz -
Subject digest:
8d0341af4f2fbafaf9803b3ad6090acaa9a7d40be5133d8c57225d168ea0bef3 - Sigstore transparency entry: 578782642
- Sigstore integration time:
-
Permalink:
FOR-sight-ai/RAGFORmance@06eed104fdca55d62845dd6258329aff8e8fd4e0 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/FOR-sight-ai
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@06eed104fdca55d62845dd6258329aff8e8fd4e0 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ragformance-0.1.105-py3-none-any.whl.
File metadata
- Download URL: ragformance-0.1.105-py3-none-any.whl
- Upload date:
- Size: 131.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d300e4626a70e4fdf8667a5594b7df101f4fd60b6248461b9b60d9923038de4
|
|
| MD5 |
04b6f81f89a92f572cb2304fffa5ef93
|
|
| BLAKE2b-256 |
80d36eb8817938b84873e128874fd15df5a7c9fd8cc5463f3f340e4a51ae4b57
|
Provenance
The following attestation bundles were made for ragformance-0.1.105-py3-none-any.whl:
Publisher:
publish.yml on FOR-sight-ai/RAGFORmance
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ragformance-0.1.105-py3-none-any.whl -
Subject digest:
1d300e4626a70e4fdf8667a5594b7df101f4fd60b6248461b9b60d9923038de4 - Sigstore transparency entry: 578782643
- Sigstore integration time:
-
Permalink:
FOR-sight-ai/RAGFORmance@06eed104fdca55d62845dd6258329aff8e8fd4e0 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/FOR-sight-ai
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@06eed104fdca55d62845dd6258329aff8e8fd4e0 -
Trigger Event:
push
-
Statement type: