Skip to main content

Ragtime 🎹 is an LLMOps framework to automatically evaluate Retrieval Augmented Generation (RAG) systems and compare different RAGs / LLMs

Project description

Ragtime 🎹 LLM Ops for all

Presentation

Ragtime 🎹 is an LLMOps framework which allows you to automatically:

  1. evaluate a Retrieval Augmented Generation (RAG) system
  2. compare different RAGs / LLMs
  3. generate Facts to allow automatic evaluation

In Ragtime 🎹, a RAG is made of, optionally, a Retriever, and always, one or several Large Language Model (LLM).

  • A Retriever takes a question in input and returns one or several chunks or paragraphs retrieved from a documents knowledge base
  • A LLM is a text to text generator taking in input a prompt, made of a question and optional chunks, and returning an LLMAnswer

You can specify how prompts are generated and how the LLMAnswer has to be post-processed to return an answer.

How does it work?

The main idea in Ragtime 🎹 is to evaluate answers returned by a RAG based on Facts that you define. Indeed, it is very difficult to evaluate RAGs and/or LLMs because you cannot define a "good" answer. Indeed a LLM can return many equialent answers expressed in different ways, making impossible a simple string comparison to determine whether an answer is right or wrong. Even though many proxies have been created, counting the number of common words like in ROUGE for instance is not very precise.

In Ragtime 🎹, answers returned by a RAG or a LLM are evaluated against a set of facts. If the answer validates all the facts, then the answer is deemed correct. Conversely, if some facts are not validated, the answer is considered wrong. The number of validated facts compared to the total number of facts to validate defines a score.

You can either define facts manually, or have a LLM define them for you. The evaluation of facts against answers is done automatically with another LLM.

Main objects

The main objects used in Ragtime 🎹 are:

  • AnswerGenerator: generate Answers with 1 or several LLMs. Each LLM uses a Prompter to get a prompt to be fed with and to post-process the LLMAnswer returned by the LLM
  • FactGenerator: generate Facts from the answers with human validation equals to 1. FactGenerator also uses an LLM to generate the facts
  • EvalGenerator: generate Evals based on Answers and Facts. Also uses a LLM to perform the evaluations.
  • LLM: generates text and return LLMAnswer objects
  • LLMAnswer: answer returned by an LLM. Contains a text field, returned by the LLM, plus a cost, a duration, a timestamp and a prompt field, being the prompt used to generate the answer
  • Prompter: a prompter is used to generate a prompt for an LLM and to post-process the text returned by the LLM
  • Expe: an experiment object, containing a list of QA objects
  • QA: an element an Expe. Contains a Question and, optionally, Facts, Chunks and Answers.
  • Question: contains a text field for the question's text. Can also contain a meta dictionary
  • Facts: a list of Fact, with a text field being the fact in itself and an LLMAnswer object if the fact has been generated by an LLM
  • Chunks: a list of Chunk containing the text of the chunk and optionally a meta dictionary with extra data associated with the retriever
  • Answers: the answer to the question is in the text field plus an LLMAnswer containing all the data related to the answer generation, plus an Eval object related to the evaluation of the answer
  • Eval: contains a human field to store human evaluation of the answer as well as a auto field when the evaluation is done automatically. In this case, it also contains an LLMAnswer object related to the automatic evaluation

Almost every object in Ragtime 🎹 has a meta field, which is a dictionnary where you can store all the extra data you need for your specific use case.

Examples

You can now go to ragtime-projects to see examples of Ragtime 🎹 in action!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragtime-0.0.22.tar.gz (173.3 kB view details)

Uploaded Source

Built Distribution

ragtime-0.0.22-py3-none-any.whl (157.5 kB view details)

Uploaded Python 3

File details

Details for the file ragtime-0.0.22.tar.gz.

File metadata

  • Download URL: ragtime-0.0.22.tar.gz
  • Upload date:
  • Size: 173.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.7

File hashes

Hashes for ragtime-0.0.22.tar.gz
Algorithm Hash digest
SHA256 8f0a6c6e619e70fb498d37878875489b4c479a1c855d6f11ccf0f9f92f2f4cae
MD5 27e87865ea15ed7a23b7025e41a2969f
BLAKE2b-256 fbb6e5056765a68950ef6fa738d606dc0155b5a0f7f65b9dcbbc080a4b03e3e7

See more details on using hashes here.

File details

Details for the file ragtime-0.0.22-py3-none-any.whl.

File metadata

  • Download URL: ragtime-0.0.22-py3-none-any.whl
  • Upload date:
  • Size: 157.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.7

File hashes

Hashes for ragtime-0.0.22-py3-none-any.whl
Algorithm Hash digest
SHA256 2754becb634729fd094c2524f6661fd2b3a3afa2bac4649d3b139a1bc883d1d7
MD5 1560b5f194f89a76ca9b2d6f819f26dd
BLAKE2b-256 3d82b294185df30fce7efe217e938801f3b30685768b838307af1789974f30df

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page