Ragtime 🎹 is an LLMOps framework to automatically evaluate Retrieval Augmented Generation (RAG) systems and compare different RAGs / LLMs
Project description
Presentation
Ragtime 🎹 is an LLMOps framework which allows you to automatically:
- evaluate a Retrieval Augmented Generation (RAG) system
- compare different RAGs / LLMs
- generate Facts to allow automatic evaluation
Ragtime 🎹 allows you to evaluate long answers not only multiple choice questions or counting common words between an answer and a baseline. It is then required to evaluate summarizers,
In Ragtime 🎹, a RAG is made of, optionally, a Retriever, and always, one or several Large Language Model (LLM).
- A Retriever takes a question in input and returns one or several chunks or paragraphs retrieved from a documents knowledge base
- A LLM is a text to text generator taking in input a prompt, made of a question and optional chunks, and returning an LLMAnswer
You can specify how prompts are generated and how the LLMAnswer has to be post-processed to return an answer.
Contributing
Glad you wish to contribute! More details here.
How does it work?
The main idea in Ragtime 🎹 is to evaluate answers returned by a RAG based on Facts that you define. Indeed, it is very difficult to evaluate RAGs and/or LLMs because you cannot define a "good" answer. A LLM can return many equivalent answers expressed in different ways, making impossible a simple string comparison to determine whether an answer is right or wrong. Even though many proxies have been created, counting the number of common words like in ROUGE for instance is not very precise (see HuggingFace's lighteval
)
In Ragtime 🎹, answers returned by a RAG or a LLM are evaluated against a set of facts. If the answer validates all the facts, then the answer is deemed correct. Conversely, if some facts are not validated, the answer is considered wrong. The number of validated facts compared to the total number of facts to validate defines a score.
You can either define facts manually, or have a LLM define them for you. The evaluation of facts against answers is done automatically with another LLM.
Main objects
The main objects used in Ragtime 🎹 are:
AnswerGenerator
: generateAnswer
s with 1 or severalLLM
s. EachLLM
uses aPrompter
to get a prompt to be fed with and to post-process theLLMAnswer
returned by theLLM
FactGenerator
: generateFacts
from the answers with human validation equals to 1.FactGenerator
also uses anLLM
to generate the factsEvalGenerator
: generateEval
s based onAnswer
s andFacts
. Also uses aLLM
to perform the evaluations.LLM
: generates text and returnLLMAnswer
objectsLLMAnswer
: answer returned by an LLM. Contains atext
field, returned by the LLM, plus acost
, aduration
, atimestamp
and aprompt
field, being the prompt used to generate the answerPrompter
: a prompter is used to generate a prompt for an LLM and to post-process the text returned by the LLMExpe
: an experiment object, containing a list ofQA
objectsQA
: an element anExpe
. Contains aQuestion
and, optionally,Facts
,Chunks
andAnswers
.Question
: contains atext
field for the question's text. Can also contain ameta
dictionaryFacts
: a list ofFact
, with atext
field being the fact in itself and anLLMAnswer
object if the fact has been generated by an LLMChunks
: a list ofChunk
containing thetext
of the chunk and optionally ameta
dictionary with extra data associated with the retrieverAnswers
: the answer to the question is in thetext
field plus anLLMAnswer
containing all the data related to the answer generation, plus anEval
object related to the evaluation of the answerEval
: contains ahuman
field to store human evaluation of the answer as well as aauto
field when the evaluation is done automatically. In this case, it also contains anLLMAnswer
object related to the automatic evaluation
Almost every object in Ragtime 🎹 has a meta
field, which is a dictionnary where you can store all the extra data you need for your specific use case.
Basic sequence
When calling a generator, the following sequence unfolds (below is en example with an AnsGenerator, a AnsPrompterBase, a MyRetriever and 2 llms instanciated as LiteLLms from their name, but it would work simlarly with any other TextGenerator, Prompter and LLM child):
main.py: ans_gen = AnsGenerator(prompter=AnsPrompterBase(), retriever=MyRetriever(), llms=["gpt4", "mistral-large"])
main.py: AnsGenerator.generate(expe)
-> TextGenerator.generate: async call _generate_for_qa(qa) for each qa in expe
--> TextGenerator._generate_for_qa: AnsGenerator.gen_for_qa(qa)
---> AnsGenerator.gen_for_qa: llm.generate for each llm in AnsGenerator
----> llm.generate: prompter.get_prompt
----> llm.generate: llm.complete
----> llm.generate: prompter.post_process
Examples
You can now go to ragtime-projects to see examples of Ragtime 🎹 in action!
Troubleshooting
Setting the API keys on Windows
API keys are stored in environment variables locally on your computer. If you are using Windows, you should first set the API keys values in the shell as:
setx OPENAI_API_KEY sk-....
The list of environment variable names to set, depending on the APIs you need to access, is given in the LiteLLM documentation.
Once the keys are set, just call ragtime.config.init_win_env
with the list of environment variables to make accessible to Python, for instance init_API_keys(['OPENAI_API_KEY'])
.
Using Google LLMs
Execute what's indicated in the LiteLLM documentation.
Also make sure your project has Vertex AI
API enabled.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ragtime-0.0.43.tar.gz
.
File metadata
- Download URL: ragtime-0.0.43.tar.gz
- Upload date:
- Size: 242.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fd91ccb1de450b05dcbcbfba8a773a397fd28fb78aad666a45649ffc266f495 |
|
MD5 | b3bc2e02ae7f22257749ae4d69f44154 |
|
BLAKE2b-256 | 301c3b676e9979c35c790b6eb99114fd2912e781afc8aabe0c8f60cb1aebbf48 |
File details
Details for the file ragtime-0.0.43-py3-none-any.whl
.
File metadata
- Download URL: ragtime-0.0.43-py3-none-any.whl
- Upload date:
- Size: 241.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 218066df329a9443f122f5a4e4ae45e2ca98ef1459dbc45372dc80c3fcbd5a23 |
|
MD5 | 420aa3ad97b57900455f16f30b554001 |
|
BLAKE2b-256 | ede03d3d966701e6c918f6a2216c7c2c0f0f9a49521a5fb1e0ed7f3b609b36d8 |