LangRAGEval is a library for evaluating responses based on faithfulness, context recall, answer relevancy, and context relevancy.

Project description

LangGPTEval

LangGPTEval, Evaluation library designed for Retrieval-Augmented Generation (RAG) responses.Evaluates the faithfulness, context recall, answer relevancy, and context relevancy of responses generated by various models, including OpenAI, Azure, and custom models. With a complex architecture and advanced Pydantic validation, LangGPTEval ensures reliable and accurate evaluation metrics.

🌟 Introduction

LangGPTEval is designed to evaluate the quality of responses generated by RAG models. It supports multiple metrics for evaluation:

Faithfulness: How true the response is to the given context.
Context Recall: How well the response recalls the given context.
Answer Relevancy: How relevant the response is to the question.
Context Relevancy: How relevant the response is to the context.

LangGPTEval is highly customizable, allowing users to plug in their models and tailor the evaluation to their specific needs.

🚀 Features

Modular Design: Easily integrate different models.
Pydantic Validation: Ensures robust input validation.
Flexible Evaluation: Evaluate multiple metrics with customizable prompts.
Exception Handling: Graceful error handling to manage evaluation failures.

🛠️ Installation

You can install LangGPTEval using pip:

pip install LangGPTEval

⚡ Quick Start

Here’s a quick start guide to get you up and running with LangGPTEval.

Install the library.
Prepare your data.
Evaluate your model.

📚 Usage

Importing the Library

First, import the necessary components from the LangGPTEval library.

from LangGPTEval.models import EvaluationInput, ContextData
from LangGPTEval.evaluation import evaluate_faithfulness, evaluate_context_recall, evaluate_answer_relevancy, evaluate_context_relevancy
from langchain.llms import OpenAI

Setting Up Your Model

Create an instance of your model. Here, we demonstrate using LangChain’s OpenAI model.

class LangChainOpenAIModel:
    def __init__(self, api_key: str):
        self.llm = OpenAI(api_key=api_key)

    def invoke(self, prompt: Any) -> str:
        response = self.llm(prompt)
        score = response.strip()
        return score

Example Data

Prepare the input data for evaluation.

context = [ContextData(page_content="Test context")]
response = "Test response"
input_data = EvaluationInput(context=context, response=response)

Evaluating the Model

Use the evaluation functions to evaluate the model’s performance.

# Replace 'your-openai-api-key' with your actual OpenAI API key
api_key = 'your-openai-api-key'
openai_model = LangChainOpenAIModel(api_key)

try:
    # Evaluate with the LangChain OpenAI model
    faithfulness_result = evaluate_faithfulness(input_data, openai_model)
    context_recall_result = evaluate_context_recall(input_data, openai_model)
    answer_relevancy_result = evaluate_answer_relevancy(input_data, openai_model)
    context_relevancy_result = evaluate_context_relevancy(input_data, openai_model)

    print(faithfulness_result.score)
    print(context_recall_result.score)
    print(answer_relevancy_result.score)
    print(context_relevancy_result.score)
except ValueError as e:
    print(f"An error occurred during evaluation: {str(e)}")

🔍 Examples

Example with Custom Model

class CustomModel:
    def invoke(self, prompt):
        # Custom model implementation
        return "0.9"  # Example score

# Create a custom model instance
custom_model = CustomModel()

try:
    # Evaluate with the custom model
    faithfulness_result = evaluate_faithfulness(input_data, custom_model)
    context_recall_result = evaluate_context_recall(input_data, custom_model)
    answer_relevancy_result = evaluate_answer_relevancy(input_data, custom_model)
    context_relevancy_result = evaluate_context_relevancy(input_data, custom_model)

    print(faithfulness_result.score)
    print(context_recall_result.score)
    print(answer_relevancy_result.score)
    print(context_relevancy_result.score)
except ValueError as e:
    print(f"An error occurred during evaluation: {str(e)}")

Example with Azure Model

class AzureModel:
    def invoke(self, prompt):
        # Azure model implementation
        return "0.8"  # Example score

# Create an Azure model instance
azure_model = AzureModel()

try:
    # Evaluate with the Azure model
    faithfulness_result = evaluate_faithfulness(input_data, azure_model)
    context_recall_result = evaluate_context_recall(input_data, azure_model)
    answer_relevancy_result = evaluate_answer_relevancy(input_data, azure_model)
    context_relevancy_result = evaluate_context_relevancy(input_data, azure_model)

    print(faithfulness_result.score)
    print(context_recall_result.score)
    print(answer_relevancy_result.score)
    print(context_relevancy_result.score)
except ValueError as e:
    print(f"An error occurred during evaluation: {str(e)}")

🤝 Contributing

Contributions are welcome! Please read the contributing guidelines before making a pull request.

Steps to Contribute

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Make your changes.
Commit your changes (git commit -m 'Add new feature').
Push to the branch (git push origin feature-branch).
Open a pull request.

📜 License

LangGPTEval is licensed under the MIT License. See the LICENSE file for more details.

Happy Evaluating! 🎉

LangGPTEval is here to make your RAG model evaluations precise and easy. If you have any questions or need further assistance, feel free to reach out to our support team.

Project details

Release history Release notifications | RSS feed

0.1.1

May 31, 2024

This version

0.1.0

May 31, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

LangRAGEval-0.1.0.tar.gz (5.0 kB view hashes)

Uploaded May 31, 2024 Source

Built Distribution

LangRAGEval-0.1.0-py3-none-any.whl (5.9 kB view hashes)

Uploaded May 31, 2024 Python 3

Hashes for LangRAGEval-0.1.0.tar.gz

Hashes for LangRAGEval-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2c1afffd602e662c96f781bd513b26451cb673f42a74a3ad0656371bfbb08b89`
MD5	`c49d9455639c4b30ebb168f40c329947`
BLAKE2b-256	`952795547725128ceb557fa23e3e9c665929ff955bbad7ad88e72062983f4bdd`

Hashes for LangRAGEval-0.1.0-py3-none-any.whl

Hashes for LangRAGEval-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`16d42a7a0904cd0e53435f32091266676cd959f8bcb3c5b1a76e3300db789ee3`
MD5	`a8949d1db584377184ce97fb61da1421`
BLAKE2b-256	`d7820bb17346c4040e0050f544385e21b7b2cc09f806ea4572d2a5be3bea0acf`