Ragtime 🎹 is an LLMOps framework to automatically evaluate Retrieval Augmented Generation (RAG) systems and compare different RAGs / LLMs

These details have not been verified by PyPI

Project links

Project description

Presentation

Ragtime 🎹 is an LLMOps framework which allows you to automatically:

evaluate a Retrieval Augmented Generation (RAG) system
compare different RAGs / LLMs
generate Facts to allow automatic evaluation

Ragtime 🎹 allows you to evaluate long answers not only multiple choice questions or counting common words between an answer and a baseline. It is then required to evaluate summarizers,

In Ragtime 🎹, a RAG is made of, optionally, a Retriever, and always, one or several Large Language Model (LLM).

A Retriever takes a question in input and returns one or several chunks or paragraphs retrieved from a documents knowledge base
A LLM is a text to text generator taking in input a prompt, made of a question and optional chunks, and returning an LLMAnswer

You can specify how prompts are generated and how the LLMAnswer has to be post-processed to return an answer.

Contributing

Glad you wish to contribute! More details here.

How does it work?

The main idea in Ragtime 🎹 is to evaluate answers returned by a RAG based on Facts that you define. Indeed, it is very difficult to evaluate RAGs and/or LLMs because you cannot define a "good" answer. A LLM can return many equivalent answers expressed in different ways, making impossible a simple string comparison to determine whether an answer is right or wrong. Even though many proxies have been created, counting the number of common words like in ROUGE for instance is not very precise (see HuggingFace's lighteval)

In Ragtime 🎹, answers returned by a RAG or a LLM are evaluated against a set of facts. If the answer validates all the facts, then the answer is deemed correct. Conversely, if some facts are not validated, the answer is considered wrong. The number of validated facts compared to the total number of facts to validate defines a score.

You can either define facts manually, or have a LLM define them for you. The evaluation of facts against answers is done automatically with another LLM.

Main objects

The main objects used in Ragtime 🎹 are:

AnswerGenerator: generate Answers with 1 or several LLMs. Each LLM uses a Prompter to get a prompt to be fed with and to post-process the LLMAnswer returned by the LLM
FactGenerator: generate Facts from the answers with human validation equals to 1. FactGenerator also uses an LLM to generate the facts
EvalGenerator: generate Evals based on Answers and Facts. Also uses a LLM to perform the evaluations.
LLM: generates text and return LLMAnswer objects
LLMAnswer: answer returned by an LLM. Contains a text field, returned by the LLM, plus a cost, a duration, a timestamp and a prompt field, being the prompt used to generate the answer
Prompter: a prompter is used to generate a prompt for an LLM and to post-process the text returned by the LLM
Expe: an experiment object, containing a list of QA objects
QA: an element an Expe. Contains a Question and, optionally, Facts, Chunks and Answers.
Question: contains a text field for the question's text. Can also contain a meta dictionary
Facts: a list of Fact, with a text field being the fact in itself and an LLMAnswer object if the fact has been generated by an LLM
Chunks: a list of Chunk containing the text of the chunk and optionally a meta dictionary with extra data associated with the retriever
Answers: the answer to the question is in the text field plus an LLMAnswer containing all the data related to the answer generation, plus an Eval object related to the evaluation of the answer
Eval: contains a human field to store human evaluation of the answer as well as a auto field when the evaluation is done automatically. In this case, it also contains an LLMAnswer object related to the automatic evaluation

Almost every object in Ragtime 🎹 has a meta field, which is a dictionnary where you can store all the extra data you need for your specific use case.

Basic sequence

When calling a generator, the following sequence unfolds (below is en example with an AnsGenerator, a AnsPrompterBase, a MyRetriever and 2 llms instanciated as LiteLLms from their name, but it would work simlarly with any other TextGenerator, Prompter and LLM child):

main.py: ans_gen = AnsGenerator(prompter=AnsPrompterBase(), retriever=MyRetriever(), llms=["gpt4", "mistral-large"])
main.py: AnsGenerator.generate(expe)
-> TextGenerator.generate: async call _generate_for_qa(qa) for each qa in expe
--> TextGenerator._generate_for_qa: AnsGenerator.gen_for_qa(qa)
---> AnsGenerator.gen_for_qa: llm.generate for each llm in AnsGenerator
----> llm.generate: prompter.get_prompt
----> llm.generate: llm.complete
----> llm.generate: prompter.post_process

Examples

You can now go to ragtime-projects to see examples of Ragtime 🎹 in action!

Troubleshooting

Setting the API keys on Windows

API keys are stored in environment variables locally on your computer. If you are using Windows, you should first set the API keys values in the shell as:

setx OPENAI_API_KEY sk-....

The list of environment variable names to set, depending on the APIs you need to access, is given in the LiteLLM documentation.

Once the keys are set, just call ragtime.config.init_win_env with the list of environment variables to make accessible to Python, for instance init_API_keys(['OPENAI_API_KEY']).

Using Google LLMs

Execute what's indicated in the LiteLLM documentation. Also make sure your project has Vertex AI API enabled.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.43

Jun 10, 2024

0.0.42

Jun 7, 2024

0.0.40

Jun 1, 2024

0.0.39

May 30, 2024

0.0.37 yanked

May 23, 2024

Reason this release was yanked:

buggy

0.0.36 yanked

May 23, 2024

Reason this release was yanked:

buggy

0.0.35 yanked

May 23, 2024

Reason this release was yanked:

buggy

0.0.34

May 16, 2024

0.0.33

May 15, 2024

0.0.32

May 6, 2024

0.0.31

May 3, 2024

0.0.30

Apr 21, 2024

0.0.29

Apr 20, 2024

0.0.28

Apr 15, 2024

0.0.27

Apr 14, 2024

0.0.26

Apr 8, 2024

0.0.25

Apr 1, 2024

0.0.24

Mar 31, 2024

0.0.23

Mar 31, 2024

0.0.22

Mar 28, 2024

0.0.21

Mar 27, 2024

0.0.20

Mar 25, 2024

0.0.19

Mar 16, 2024

0.0.18

Mar 16, 2024

0.0.17

Mar 15, 2024

0.0.2

Mar 6, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragtime-0.0.43.tar.gz (242.2 kB view details)

Uploaded Jun 10, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragtime-0.0.43-py3-none-any.whl (241.3 kB view details)

Uploaded Jun 10, 2024 Python 3

File details

Details for the file ragtime-0.0.43.tar.gz.

File metadata

Download URL: ragtime-0.0.43.tar.gz
Upload date: Jun 10, 2024
Size: 242.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.10.7

File hashes

Hashes for ragtime-0.0.43.tar.gz
Algorithm	Hash digest
SHA256	`1fd91ccb1de450b05dcbcbfba8a773a397fd28fb78aad666a45649ffc266f495`
MD5	`b3bc2e02ae7f22257749ae4d69f44154`
BLAKE2b-256	`301c3b676e9979c35c790b6eb99114fd2912e781afc8aabe0c8f60cb1aebbf48`

See more details on using hashes here.

File details

Details for the file ragtime-0.0.43-py3-none-any.whl.

File metadata

Download URL: ragtime-0.0.43-py3-none-any.whl
Upload date: Jun 10, 2024
Size: 241.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.10.7

File hashes

Hashes for ragtime-0.0.43-py3-none-any.whl
Algorithm	Hash digest
SHA256	`218066df329a9443f122f5a4e4ae45e2ca98ef1459dbc45372dc80c3fcbd5a23`
MD5	`420aa3ad97b57900455f16f30b554001`
BLAKE2b-256	`ede03d3d966701e6c918f6a2216c7c2c0f0f9a49521a5fb1e0ed7f3b609b36d8`

See more details on using hashes here.

ragtime 0.0.43

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Presentation

Contributing

How does it work?

Main objects

Basic sequence

Examples

Troubleshooting

Setting the API keys on Windows

Using Google LLMs

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes