Skip to main content

Agiflow (EVAL) for Python

Project description

AGIFlow Eval

Overview

agiflow_eval is a customizable evaluation library designed to measure various metrics for language model outputs. It provides tools to evaluate aspects such as answer relevancy, hallucination, bias, faithfulness, contextual relevancy, and toxicity. The library is ported from the awesome DeepEval to support custom evaluation templates and LLM models.

Installation

First, ensure you have the necessary packages installed. You can install the required dependencies using pip:

pip install agiflow-eval

Usage

To use the metrics, first initialize the model and aggregator:

from agiflow_eval import (
  EvalLiteLLM,
  MetadataAggregator,
)

metadata = MetadataAggregator()
model = EvalLiteLLM()

Then create the test case and measure the metric as follows:

Answer Relevancy Metric

Evaluates the relevancy of an answer given a specific input.

from agiflow_eval import AnswerRelevancyMetric, LLMTestCase

metric = AnswerRelevancyMetric(metadata=metadata, model=model)
test_case = LLMTestCase(input="input text", actual_output="actual output text")
score = await metric.a_measure(test_case)

Bias Metric

Measures the presence of bias in the model's output.

from agiflow_eval import BiasMetric, LLMTestCase

metric = BiasMetric(metadata=metadata, model=model)
test_case = LLMTestCase(input="input text", actual_output="actual output text")
score = await metric.a_measure(test_case)

Contextual Relevancy Metric

Assesses the relevancy of the output in a given context.

from agiflow_eval import ContextualRelevancyMetric, LLMTestCase

metric = ContextualRelevancyMetric(metadata=metadata, model=model)
test_case = LLMTestCase(
  input="input text", 
  actual_output="actual output text",
  retrieval_context="retrieval context text"
)
score = await metric.a_measure(test_case)

Faithfulness Metric

Determines the faithfulness of the model's output to the given context or input.

from agiflow_eval import FaithfulnessMetric, LLMTestCase

metric = FaithfulnessMetric(metadata=metadata, model=model)
test_case = LLMTestCase(
  input="input text", 
  actual_output="actual output text",
  retrieval_context="retrieval context text"
)
score = await metric.a_measure(test_case)

Hallucination Metric

Measures the degree of hallucination in the model's output.

from agiflow_eval import HallucinationMetric, LLMTestCase

metric = HallucinationMetric(metadata=metadata, model=model)
test_case = LLMTestCase(
  input="input text", 
  actual_output="actual output text",
  context="context text"
)
score = await metric.a_measure(test_case)

Toxicity Metric

Evaluates the toxicity level of the model's output.

from agiflow_eval import ToxicityMetric, LLMTestCase

metric = ToxicityMetric(metadata=metadata, model=model)
test_case = LLMTestCase(
  input="input text", 
  actual_output="actual output text"
)
score = await metric.a_measure(test_case)

Custom Template

You can simply extends the Default Metric Template class and pass it to Metric class as follow:

from agiflow_eval import ToxicityMetric, ToxicityTemplate, LLMTestCase

class YourTemplate(ToxicityTemplate):
...

metric = ToxicityMetric(metadata=metadata, model=model, template=YourTemplate())

Contributing

We welcome contributions to agiflow_eval. Please see our CONTRIBUTING.md for guidelines on how to get involved.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Special thanks to the DeepEval project for providing the foundation upon which this library is built.

Contact

For any questions or feedback, please open an issue or reach out via the project's contact information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agiflow_eval-0.0.2.tar.gz (73.6 kB view details)

Uploaded Source

Built Distribution

agiflow_eval-0.0.2-py3-none-any.whl (38.6 kB view details)

Uploaded Python 3

File details

Details for the file agiflow_eval-0.0.2.tar.gz.

File metadata

  • Download URL: agiflow_eval-0.0.2.tar.gz
  • Upload date:
  • Size: 73.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.11.6 Darwin/22.6.0

File hashes

Hashes for agiflow_eval-0.0.2.tar.gz
Algorithm Hash digest
SHA256 609ed3147758e173748da8fd5ca39715eb921001dc8d48634caf75328745755f
MD5 ff71bd2e7ba905d9f5c17abf7d3fd79b
BLAKE2b-256 835bfeb5d86c2f1309918efe40fd26074f45bb18fb27ddc3b51827c9d6470542

See more details on using hashes here.

File details

Details for the file agiflow_eval-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: agiflow_eval-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 38.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.11.6 Darwin/22.6.0

File hashes

Hashes for agiflow_eval-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 435615894b301f70f5882d70461d075683857b981b9cef7f9f876407ff34ae26
MD5 1c5d5d4a9cf0d6c70a7bc2bc6c034c9a
BLAKE2b-256 a7c20db9a523808c34f8a1d486669ee7bb3b40825370b9b70e711db25fdad044

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page