Skip to main content

Agiflow (EVAL) for Python

Project description

AGIFlow Eval

Overview

agiflow_eval is a customizable evaluation library designed to measure various metrics for language model outputs. It provides tools to evaluate aspects such as answer relevancy, hallucination, bias, faithfulness, contextual relevancy, and toxicity. The library is ported from the awesome DeepEval to support custom evaluation templates and LLM models.

Installation

First, ensure you have the necessary packages installed. You can install the required dependencies using pip:

pip install agiflow-eval

Usage

To use the metrics, first initialize the model and aggregator:

from agiflow_eval import (
  EvalLiteLLM,
  MetadataAggregator,
)

metadata = MetadataAggregator()
model = EvalLiteLLM()

Then create the test case and measure the metric as follows:

Answer Relevancy Metric

Evaluates the relevancy of an answer given a specific input.

from agiflow_eval import AnswerRelevancyMetric, LLMTestCase

metric = AnswerRelevancyMetric(metadata=metadata, model=model)
test_case = LLMTestCase(input="input text", actual_output="actual output text")
score = await metric.a_measure(test_case)

Bias Metric

Measures the presence of bias in the model's output.

from agiflow_eval import BiasMetric, LLMTestCase

metric = BiasMetric(metadata=metadata, model=model)
test_case = LLMTestCase(input="input text", actual_output="actual output text")
score = await metric.a_measure(test_case)

Contextual Relevancy Metric

Assesses the relevancy of the output in a given context.

from agiflow_eval import ContextualRelevancyMetric, LLMTestCase

metric = ContextualRelevancyMetric(metadata=metadata, model=model)
test_case = LLMTestCase(
  input="input text", 
  actual_output="actual output text",
  retrieval_context="retrieval context text"
)
score = await metric.a_measure(test_case)

Faithfulness Metric

Determines the faithfulness of the model's output to the given context or input.

from agiflow_eval import FaithfulnessMetric, LLMTestCase

metric = FaithfulnessMetric(metadata=metadata, model=model)
test_case = LLMTestCase(
  input="input text", 
  actual_output="actual output text",
  retrieval_context="retrieval context text"
)
score = await metric.a_measure(test_case)

Hallucination Metric

Measures the degree of hallucination in the model's output.

from agiflow_eval import HallucinationMetric, LLMTestCase

metric = HallucinationMetric(metadata=metadata, model=model)
test_case = LLMTestCase(
  input="input text", 
  actual_output="actual output text",
  context="context text"
)
score = await metric.a_measure(test_case)

Toxicity Metric

Evaluates the toxicity level of the model's output.

from agiflow_eval import ToxicityMetric, LLMTestCase

metric = ToxicityMetric(metadata=metadata, model=model)
test_case = LLMTestCase(
  input="input text", 
  actual_output="actual output text"
)
score = await metric.a_measure(test_case)

Custom Template

You can simply extends the Default Metric Template class and pass it to Metric class as follow:

from agiflow_eval import ToxicityMetric, ToxicityTemplate, LLMTestCase

class YourTemplate(ToxicityTemplate):
...

metric = ToxicityMetric(metadata=metadata, model=model, template=YourTemplate())

Contributing

We welcome contributions to agiflow_eval. Please see our CONTRIBUTING.md for guidelines on how to get involved.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Special thanks to the DeepEval project for providing the foundation upon which this library is built.

Contact

For any questions or feedback, please open an issue or reach out via the project's contact information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agiflow_eval-0.0.2.tar.gz (73.6 kB view hashes)

Uploaded Source

Built Distribution

agiflow_eval-0.0.2-py3-none-any.whl (38.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page