Skip to main content

A Python package for evaluating LLM application outputs.

Project description

GroundedEval by GroundedAI

Overview

The grounded-eval package is a powerful tool developed by GroundedAI to evaluate the performance of large language models (LLMs) and their applications. It leverages small language models and adapters to compute various metrics, providing insights into the quality and reliability of LLM outputs.

Features

  • Metric Evaluation: Compute a wide range of metrics to assess the performance of LLM outputs, including:

    • Factual accuracy
    • Relevance to the given context
    • Potential biases or toxicity
    • Hallucination
  • Small Language Model Integration: Utilize state-of-the-art small language models, optimized for efficient evaluation tasks, to analyze LLM outputs accurately and quickly.

  • Adapter Support: Leverage GroundedAI's proprietary adapters, such as the phi3-toxicity-judge adapter, to fine-tune the small language models for specific domains, tasks, or evaluation criteria, ensuring tailored and precise assessments.

  • Flexible Input/Output Handling: Accept LLM outputs in various formats (text, JSON, etc.) and provide evaluation results in a structured and easily consumable manner.

  • Customizable Evaluation Pipelines: Define and configure evaluation pipelines to combine multiple metrics, weights, and thresholds based on your specific requirements.

  • Reporting and Visualization: Generate comprehensive reports and visualizations to communicate evaluation results effectively, facilitating decision-making and model improvement processes.

Getting Started

Install the grounded-eval package:

pip install grounded-eval

Example Usage: Toxicity Evaluation

The ToxicityEvaluator class is used to evaluate the toxicity of a given text. Here's an example of how to use it:

from grounded_eval.evaluators import ToxicityEvaluator

base_model_id = "microsoft/Phi-3-mini-4k-instruct"
groundedai_eval_id = "grounded-ai/phi3-toxicity-judge"

# Initialize the ToxicityEvaluator
evaluator = ToxicityEvaluator(base_model_id, groundedai_eval_id, quantization=True)

# Load the base model and the GroundedAI adapter
evaluator.load_model(base_model_id, groundedai_eval_id)

# Define the data to be evaluated
data = [
    "That guy is so stupid and ugly",
    "Bunnies are so fluffy and cute"
]

# Evaluate the data for toxicity
response = evaluator.evaluate(data)

# Output
# {'toxic': 1, 'non-toxic': 1, 'percentage_toxic': 50.0}
print(response)

In this example, we initialize the ToxicityEvaluator with the base model ID (microsoft/Phi-3-mini-4k-instruct) and the GroundedAI adapter ID (grounded-ai/phi3-toxicity-judge). The quantization parameter is set to True to enable quantization for faster inference.

We then load the base model and the GroundedAI adapter using the load_model method.

Next, we define a list of texts (data) that we want to evaluate for toxicity.

Finally, we call the evaluate method with the data list, and it returns a dictionary containing the number of toxic and non-toxic texts, as well as the percentage of toxic texts.

In the output, we can see that out of the two texts, one is classified as toxic, and the other as non-toxic, resulting in a 50% toxicity percentage.

Documentation

Detailed documentation, including API references, examples, and guides, coming soon at https://groundedai.tech/api.

Contributing

We welcome contributions from the community! If you encounter any issues or have suggestions for improvements, please open an issue or submit a pull request on the GroundedAI grounded-eval GitHub repository.

License

The grounded-eval package is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grounded_ai-0.0.3.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

grounded_ai-0.0.3-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file grounded_ai-0.0.3.tar.gz.

File metadata

  • Download URL: grounded_ai-0.0.3.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for grounded_ai-0.0.3.tar.gz
Algorithm Hash digest
SHA256 92c5e21590dc8106ff391637c69dd068d8642ea9103017952ea6605d65031f91
MD5 05d3024c84057b5660fae2dbeb0681fe
BLAKE2b-256 7579eabe21dd3622bc28878d7ef5f59ac9cef6d8a2721d929f7f0a975790cb0b

See more details on using hashes here.

File details

Details for the file grounded_ai-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: grounded_ai-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for grounded_ai-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 038d1398887e0933590f2818f2552ae0a427acb313f429d8ee5047908760de1a
MD5 cddf43935965a74cd0b9e5caf97486af
BLAKE2b-256 49da847274d4ec0d7a2a74197f9cf55fc20822b3fc07a30ae4066fbb0c4da92d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page