Skip to main content

LARES: vaLidation, evAluation and REliability Solutions

Project description

LARES (vaLidation, evAluation and REliability Solutions) is a Python package designed to assist with the evaluation and validation models in various tasks such as translation, summarization, and rephrasing.

This package leverages a suite of existing tools and resources to provide the best form of evaluation and validation for the prompted task. Natural Language Toolkit (NLTK), BERT, and ROUGE are employed for evaluations, while Microsoft's Fairlearn, Facebook's BART, and roBERTa are used to assess and address the toxicity and fairness of a given model.

In addition, LARES uses datasets from HuggingFace, where the choice of datasets was informed by benchmark setters such as the General Language Understanding Evaluation (GLUE) benchmark.

Features

  • Quantitative and Qualitative Evaluation: Provides both qualitative and quantitative approaches to evaluating models. Quantitative metrics include METEOR scores for translations, normalized ROUGE scores for summarizations, and BERT scores for rephrasing tasks. Qualitative metrics are computed both from binary user judgements as well as sentiment analysis done on user feedback.

  • Fairness and Toxicity Validation: Provides a quantitative measure of the toxicity and fairness of a given model for specific tasks by leveraging Fairlearn and roBERTa.

  • Iterative Reconstruction: Iteratively rephrases model responses until below a specified toxicity and above a specified quality threshold using BART

Workflow

Installation

Requires Python 3.6 or later. You can install using pip via:

pip install lares

Usage

Here is a basic usage example for translation task:

import openai
from datasets import load_dataset
from lares import *

openai.api_key = '' # replace with your OpenAI API key
dataset = load_dataset("opus100", "en-fr")

for data in dataset["validation"]['translation'][100:110]:
    prompt = data["en"]
    reference = data["fr"]

    input_prompt = "Translate the following to french: "+prompt
    print(input_prompt)
    result = generate(input_prompt, reference, task_type='Translation')

    print(f"Prompt: {prompt}")
    print(f"Reference: {reference}")
    print(f"Generated Response: {result}\n")

Dependencies

  • openai==0.27.8
  • nltk==3.7
  • torch==2.0.1
  • transformers==4.31.0
  • rouge==1.0.1
  • bert_score==0.3.12
  • datasets==1.11.0

To be explicit, you can install via:

pip install openai==0.27.8 nltk==3.7 torch==2.0.1 transformers==4.31.0 rouge==1.0.1 bert_score==0.3.12 datasets==1.11.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lares-0.0.20.tar.gz (6.0 kB view hashes)

Uploaded Source

Built Distribution

lares-0.0.20-py3-none-any.whl (6.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page