Agiflow (EVAL) for Python
Project description
AGIFlow Eval
Overview
agiflow_eval
is a customizable evaluation library designed to measure various metrics for language model outputs. It provides tools to evaluate aspects such as answer relevancy, hallucination, bias, faithfulness, contextual relevancy, and toxicity. The library is ported from the awesome DeepEval to support custom evaluation templates and LLM models.
Installation
First, ensure you have the necessary packages installed. You can install the required dependencies using pip
:
pip install agiflow-eval
Usage
To use the metrics, first initialize the model and aggregator:
from agiflow_eval import (
EvalLiteLLM,
MetadataAggregator,
)
metadata = MetadataAggregator()
model = EvalLiteLLM()
Then create the test case and measure the metric as follows:
Answer Relevancy Metric
Evaluates the relevancy of an answer given a specific input.
from agiflow_eval import AnswerRelevancyMetric, LLMTestCase
metric = AnswerRelevancyMetric(metadata=metadata, model=model)
test_case = LLMTestCase(input="input text", actual_output="actual output text")
score = await metric.a_measure(test_case)
Bias Metric
Measures the presence of bias in the model's output.
from agiflow_eval import BiasMetric, LLMTestCase
metric = BiasMetric(metadata=metadata, model=model)
test_case = LLMTestCase(input="input text", actual_output="actual output text")
score = await metric.a_measure(test_case)
Contextual Relevancy Metric
Assesses the relevancy of the output in a given context.
from agiflow_eval import ContextualRelevancyMetric, LLMTestCase
metric = ContextualRelevancyMetric(metadata=metadata, model=model)
test_case = LLMTestCase(
input="input text",
actual_output="actual output text",
retrieval_context="retrieval context text"
)
score = await metric.a_measure(test_case)
Faithfulness Metric
Determines the faithfulness of the model's output to the given context or input.
from agiflow_eval import FaithfulnessMetric, LLMTestCase
metric = FaithfulnessMetric(metadata=metadata, model=model)
test_case = LLMTestCase(
input="input text",
actual_output="actual output text",
retrieval_context="retrieval context text"
)
score = await metric.a_measure(test_case)
Hallucination Metric
Measures the degree of hallucination in the model's output.
from agiflow_eval import HallucinationMetric, LLMTestCase
metric = HallucinationMetric(metadata=metadata, model=model)
test_case = LLMTestCase(
input="input text",
actual_output="actual output text",
context="context text"
)
score = await metric.a_measure(test_case)
Toxicity Metric
Evaluates the toxicity level of the model's output.
from agiflow_eval import ToxicityMetric, LLMTestCase
metric = ToxicityMetric(metadata=metadata, model=model)
test_case = LLMTestCase(
input="input text",
actual_output="actual output text"
)
score = await metric.a_measure(test_case)
Custom Template
You can simply extends the Default Metric Template class and pass it to Metric class as follow:
from agiflow_eval import ToxicityMetric, ToxicityTemplate, LLMTestCase
class YourTemplate(ToxicityTemplate):
...
metric = ToxicityMetric(metadata=metadata, model=model, template=YourTemplate())
Contributing
We welcome contributions to agiflow_eval
. Please see our CONTRIBUTING.md for guidelines on how to get involved.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgements
Special thanks to the DeepEval project for providing the foundation upon which this library is built.
Contact
For any questions or feedback, please open an issue or reach out via the project's contact information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file agiflow_eval-0.0.2.tar.gz
.
File metadata
- Download URL: agiflow_eval-0.0.2.tar.gz
- Upload date:
- Size: 73.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.11.6 Darwin/22.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 609ed3147758e173748da8fd5ca39715eb921001dc8d48634caf75328745755f |
|
MD5 | ff71bd2e7ba905d9f5c17abf7d3fd79b |
|
BLAKE2b-256 | 835bfeb5d86c2f1309918efe40fd26074f45bb18fb27ddc3b51827c9d6470542 |
File details
Details for the file agiflow_eval-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: agiflow_eval-0.0.2-py3-none-any.whl
- Upload date:
- Size: 38.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.11.6 Darwin/22.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 435615894b301f70f5882d70461d075683857b981b9cef7f9f876407ff34ae26 |
|
MD5 | 1c5d5d4a9cf0d6c70a7bc2bc6c034c9a |
|
BLAKE2b-256 | a7c20db9a523808c34f8a1d486669ee7bb3b40825370b9b70e711db25fdad044 |