Library for evaluating RAG using Nuclia's models

These details have not been verified by PyPI

Project links

Project description

nuclia-eval: Evaluate your RAG with nuclia's models

nuclia, the all-in-one RAG as a service platform.

Library for evaluating RAG using nuclia's models

Its evaluation follows the RAG triad as proposed by TruLens:

rag triad

In summary, the metrics nuclia-eval provides for a RAG Experience involving a question, an answer and N pieces of context are:

Answer Relevance: Answer relevance refers to the directness and appropriateness of the response in addressing the specific question asked, providing accurate, complete, and contextually suitable information.
- score: A number between 0 and 5 indicating the score of the relevance of the answer to the question.
- reason: A string explaining the reason for the score.
For each of the N pieces of context:
- Context Relevance Score: The context relevance is the relevance of the context to the question, on a scale of 0 to 5.
- Groudedness Score: Groundedness is defined as the degree of information overlap to which the answer contains information that is substantially similar or identical to that in the context piece. The score is between 0 and 5.

Installation

pip install nuclia-eval

Available Models

REMi-v0

REMi-v0 (RAG Evaluation MetrIcs) is a LoRa adapter for the Mistral-7B-Instruct-v0.3 model.

It has been finetuned by the team at nuclia to evaluate the quality of all parts of the RAG experience.

Usage

from nuclia_eval import REMi

evaluator = REMi()

query = "By how many Octaves can I shift my OXYGEN PRO 49 keyboard?"

context1 = """\
* Oxygen Pro 49's keyboard can be shifted 3 octaves down or 4 octaves up.
* Oxygen Pro 61's keyboard can be shifted 3 octaves down or 3 octaves up.

To change the transposition of the keyboard, press and hold Shift, and then use the Key Octave –/+ buttons to lower or raise the keybed by one one, respectively.
The display will temporarily show TRANS and the current transposition (-12 to 12)."""
context2 ="""\
To change the octave of the keyboard, use the Key Octave –/+ buttons to lower or raise the octave, respectively
The display will temporarily show OCT and the current octave shift.\n\nOxygen Pro 25's keyboard can be shifted 4 octaves down or 5 octaves up"""
context3 = """\
If your DAW does not automatically configure your Oxygen Pro series keyboard, please follow the setup steps listed in the Oxygen Pro DAW Setup Guides.
To set the keyboard to operate in Preset Mode, press the DAW/Preset Button (on the Oxygen Pro 25) or Preset Button (on the Oxygen Pro 49 and 61).
On the Oxygen Pro 25 the DAW/Preset button LED will be off to show that Preset Mode is selected.
On the Oxygen Pro 49 and 61 the Preset button LED will be lit to show that Preset Mode is selected."""

answer = "Based on the context provided, The Oxygen Pro 49's keyboard can be shifted 3 octaves down or 4 octaves up."

result = evaluator.evaluate_rag(query=query, answer=answer, contexts=[context1, context2, context3])
answer_relevance, context_relevances, groundednesses = result

print(f"{answer_relevance.score}, {answer_relevance.reason}")
# 5, The response directly answers the query by specifying the range of octave shifts for the Oxygen Pro 49 keyboard.
print([cr.score for cr in context_relevances]) # [5, 1, 0]
print([g.score for g in groundednesses]) # [2, 0, 0]

Granularity

The REMi evaluator provides a fine-grained and strict evaluation of the RAG triad. For instance if we slightly modify the answer to the query:

- answer = "Based on the context provided, The Oxygen Pro 49's keyboard can be shifted 3 octaves down or 4 octaves up."
+ answer = "Based on the context provided, the Oxygen Pro 49's keyboard can be shifted 4 octaves down or 4 octaves up."

...

print([g.score for g in groundednesses]) # [0, 0, 0]

As the information provided in the answer is not present in any of the contexts, the groundedness score is 0 for all contexts.

What if the information in the answer does not answer the question?

- answer = "Based on the context provided, The Oxygen Pro 49's keyboard can be shifted 3 octaves down or 4 octaves up."
+ answer = "Based on the context provided, the Oxygen Pro 61's keyboard can be shifted 3 octaves down or 4 octaves up."

...

print(f"{answer_relevance.score}, {answer_relevance.reason}")
# 1, The response is relevant to the entire query but incorrectly mentions the Oxygen Pro 61 instead of the Oxygen Pro 49

Individual Metrics

We can also compute each metric separately:

...

answer_relevance = evaluator.answer_relevance(query=query, answer=answer)
context_relevances = evaluator.context_relevance(query=query, contexts=[context1, context2, context3])
groundednesses = evaluator.groundedness(answer=answer, contexts=[context1, context2, context3])
...

Feedback and Community

For feedback, questions, or to get in touch with the nuclia team, we are available on our community Slack channel.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.3

Jul 31, 2024

This version

1.0.2

Jul 23, 2024

1.0.1

Jul 23, 2024

1.0.0

Jul 23, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nuclia_eval-1.0.2.tar.gz (14.3 kB view details)

Uploaded Jul 23, 2024 Source

File details

Details for the file nuclia_eval-1.0.2.tar.gz.

File metadata

Download URL: nuclia_eval-1.0.2.tar.gz
Upload date: Jul 23, 2024
Size: 14.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.13

File hashes

Hashes for nuclia_eval-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`d29c74a10c346e494370c42ddcd48ca8a8d05ee73fbb667447e0e4db2a75248f`
MD5	`fabfcc663d241ee6d2824a61ab8dbc9e`
BLAKE2b-256	`e40aceef5d73be1a1b876ad1acf2f8a4b744c9dbb51702935c83c4a54f107bfe`