Skip to main content

No project description provided

Project description

Overview

Ariadne AI is an open-source library for evaluating text summarization and retireved-augmented-generation (RAG) chatbots without the necessity for human-annotated reference summaries. Each evaluator is paired with an explanation that help developers evaluate their LLMs and detect the reason for their failure cases. Our approach leverages LLM's reasoning to provide explanations for the failures.

Installation

pip install ariadne-ai

Text Summarization Usage

Here's a simple usage example to load a json file for text summarization.

loader = TextSummarizationLoader(format = 'json')
loader.load("path_to_your_file.json")
text_summarization_evaluator = SummarizationHallucinationEvaluator(text_summarization_loader)
text_summarization_evaluator.run()

or run

 poetry run python example.py

RAG Usage

loader = RagLoader(
    col_question= 'question', 
    col_context='context', 
    col_answer= 'answer', 
    col_label ='label')

# Faithfullness
loader.load(INPUT_FILEPATH)
evaluator = FaithfulnessEvaluator(loader)
evaluator.run()

or

 poetry run python run_experiment_rag.py

Text Summarization

Text Summarization QAG Approach

For text summarization, a question-answer generation (QAG) framework has been developed, which allows us to pinpoint failure cases in production without human-annotated reference summaries. Here is a breakdown of our approach:

  1. Question Generation: The LLM formulates closed-ended (Yes/No) questions drawing from both the summary and the main document.
  2. Summary-based Answers: An LLM answerer generator responds to these questions using only the summary as a reference. The potential responses include "Yes," "No," and "Unknown."
  3. Document-based Answers: Similarly, the LLM answerer generator answers the same set of questions, but this time, it references the primary document. Possible responses remain "Yes," "No," and "Unknown."
  4. Evaluation Metrics: The evaluation metrics assessing the consistency between the summary-based and document-based summaries are computed to draw conclusions.

The following failure are detected based on the above approach:

  • Hallucination Failure: A hallucination failure occurs when a question gets a 'Yes/No' answer based on the summary but receives an 'Unknown' answer based on the original document.

  • Contradiction Failure: A contradiction failure is detected when at least one question is answered 'Yes' based on the summary, but 'No' when based on the full document, or vice-versa.

  • Non-informativeness Failure: A non-informativeness failure occurs when at least one question is answered as 'Unknown' based on the summary but a definitive 'Yes/No' based on the original document.

Retrived Augmented Generation (RAG)

Here is a breakdown of our approach:

  1. Evaluation: The LLM determines whether a critirion is met, leveraging its reasoning
  2. Explanation: The LLM explains the reasoning behind its decision, providing clarity regarding the failure cases.

The following failure cases are detected:

  1. Faithfulness Failure: A faithfulness failure occurs if the response cannot be inferred purely from the context provided.
  2. Context Relevance Failure: A context relevance failure (bad retrieval) occures if the user's query cannot be answered purely from the retrieved context..
  3. Answer Relevance Failure: An answer relevacne failure occurs if the response does not answer the question.

Contribution

Please feel free to reach out to christos@athina.ai or shiv@athina.ai if you would like to contribute. You could find more on how you could integrate the evaluations in your product here: https://docs.athina.ai.

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ariadne_ai-0.1.4.tar.gz (19.7 kB view details)

Uploaded Source

Built Distribution

ariadne_ai-0.1.4-py3-none-any.whl (37.8 kB view details)

Uploaded Python 3

File details

Details for the file ariadne_ai-0.1.4.tar.gz.

File metadata

  • Download URL: ariadne_ai-0.1.4.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.9.7 Darwin/22.5.0

File hashes

Hashes for ariadne_ai-0.1.4.tar.gz
Algorithm Hash digest
SHA256 f371a182d8a89b03b742a18a87839e37c0431eb6907a36d2f0e78102dbbbc8cc
MD5 ae06a84d00e65af4a5af7449eef801a2
BLAKE2b-256 1c7913af21ed8606e0d771af0994d7ff6f75f2a420f159f4eda84eb37b4cb38a

See more details on using hashes here.

File details

Details for the file ariadne_ai-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: ariadne_ai-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 37.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.9.7 Darwin/22.5.0

File hashes

Hashes for ariadne_ai-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 42d8a8f65cf30ec4bd233ebb8a0749af5b7468610f70e23262f12a2ce82fa95e
MD5 1673f9ab949539782cb16dcff6843cd3
BLAKE2b-256 465bdcbb53f192b48085b06c3a1e44cd13510c306176ac1ac9768110b17c0dda

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page