Skip to main content

GenAI Results Comparator, GAICo, is a Python library to help compare, analyze and visualize outputs from Large Language Models (LLMs), often against a reference text. In doing so, one can use a range of extensible metrics from the literature.

Project description

GAICo: GenAI Results Comparator

Repository: github.com/ai4society/GenAIResultsComparator

Documentation: ai4society.github.io/projects/GenAIResultsComparator

Overview

GenAI Results Comparator, GAICo, is a Python library to help compare, analyze and visualize outputs from Large Language Models (LLMs), often against a reference text. In doing so, one can use a range of extensible metrics from the literature.

At the core, the library provides a set of metrics for evaluating text strings given as inputs and produce outputs on a scale of 0 to 1 (normalized), where 1 indicates a perfect match between the texts. These scores are use to analyze LLM outputs as well as visualize.

Quickstart

GAICo's Experiment class offers a streamlined workflow for comparing multiple model outputs, applying thresholds, generating plots, and creating CSV reports.

Here's a quick example:

from gaico import Experiment

# Sample data from https://arxiv.org/abs/2504.07995
llm_responses = {
    "Google": "Title: Jimmy Kimmel Reacts to Donald Trump Winning the Presidential ... Snippet: Nov 6, 2024 ...",
    "Mixtral 8x7b": "I'm an Al and I don't have the ability to predict the outcome of elections.",
    "SafeChat": "Sorry, I am designed not to answer such a question.",
}
reference_answer = "Sorry, I am unable to answer such a question as it is not appropriate."

# 1. Initialize Experiment
exp = Experiment(
    llm_responses=llm_responses,
    reference_answer=reference_answer
)

# 2. Compare models using specific metrics
#   This will calculate scores for 'Jaccard' and 'ROUGE',
#   generate a plot (e.g., radar plot for multiple metrics/models),
#   and save a CSV report.
results_df = exp.compare(
    metrics=['Jaccard', 'ROUGE'],  # Specify metrics, or None for all defaults
    plot=True,
    output_csv_path="experiment_report.csv",
    custom_thresholds={"Jaccard": 0.6, "ROUGE_rouge1": 0.35} # Optional: override default thresholds
)

# The returned DataFrame contains the calculated scores
print("Scores DataFrame from compare():")
print(results_df)

For more detailed examples, please refer to our Jupyter Notebooks in the examples/ folder in the repository.

Features

  • Implements various metrics for text comparison:
    • N-gram-based metrics (BLEU, ROUGE, JS divergence)
    • Text similarity metrics (Jaccard, Cosine, Levenshtein, Sequence Matcher)
    • Semantic similarity metrics (BERTScore)
  • Provides visualization capabilities using matplotlib and seaborn for plots like bar charts and radar plots.
  • Allows exportation of results to CSV files, including scores and threshold pass/fail status.
  • Provides streamlined Experiment class for easy comparison of multiple models, applying thresholds, plotting, and reporting.
  • Supports batch processing for efficient computation.
  • Optimized for different input types (lists, numpy arrays, pandas Series).
  • Has extendable architecture for easy addition of new metrics.
  • Supports automated testing of metrics using Pytest.

Installation

GAICo can be installed using pip. We strongly recommend using a Python virtual environment to manage dependencies and avoid conflicts with other packages.

  • Create and activate a virtual environment (e.g., named gaico-env):

      # For Python 3.10+
      python3 -m venv gaico-env
      source gaico-env/bin/activate  # On macOS/Linux
      # gaico-env\Scripts\activate   # On Windows
    
  • Install GAICo: Once your virtual environment is active, install GAICo using pip:

      pip install gaico
    

This installs the core GAICo library.

Using GAICo with Jupyter Notebooks/Lab

If you plan to use GAICo within Jupyter Notebooks or JupyterLab (recommended for exploring examples and interactive analysis), install them into the same activated virtual environment:

# (Ensure your 'gaico-env' is active)
pip install notebook  # For Jupyter Notebook
# OR
# pip install jupyterlab # For JupyterLab

Then, launch Jupyter from the same terminal where your virtual environment is active:

# (Ensure your 'gaico-env' is active)
jupyter notebook
# OR
# jupyter lab

New notebooks created in this session should automatically use the gaico-env Python environment. For troubleshooting kernel issues, please see our FAQ document.

Optional Installations for GAICo

The default installation includes core metrics and is lightweight. For optional features and metrics that have larger dependencies:

  • To include the BERTScore metric (which has larger dependencies like PyTorch):
    pip install 'gaico[bertscore]'
    
  • To include the CosineSimilarity metric (requires scikit-learn):
    pip install 'gaico[cosine]'
    
  • To include the JSDivergence metric (requires SciPy and NLTK):
    pip install 'gaico[jsd]'
    
  • To install with all optional features:
    pip install 'gaico[bertscore,cosine,jsd]'
    
    (Note: All optional features are also installed if you use the dev extra for development installs.)

Installation Size Comparison

The following table provides an estimated overview of the relative disk space impact of different installation options. Actual sizes may vary depending on your operating system, Python version, and existing packages. These are primarily to illustrate the relative impact of optional dependencies.

Note: Core dependencies include: levenshtein, matplotlib, numpy, pandas, rouge-score, and seaborn.

Installation Command Dependencies Estimated Total Size Impact
pip install gaico Core 210 MB
pip install 'gaico[jsd]' Core + scipy, nltk 310 MB
pip install 'gaico[cosine]' Core + scikit-learn 360 MB
pip install 'gaico[bertscore]' Core + bert-score (includes torch, transformers, etc.) 800 MB
pip install 'gaico[bertscore,cosine,jsd]' Core + all dependencies from above 950 MB

For Developers (Installing from source)

If you want to contribute to GAICo or install it from source for development:

  1. Clone the repository:

    git clone https://github.com/ai4society/GenAIResultsComparator.git
    cd GenAIResultsComparator
    
  2. Set up a virtual environment and install dependencies: We recommend using UV for managing environments and dependencies.

    # Create a virtual environment (e.g., Python 3.10-3.12 recommended)
    uv venv
    # Activate the environment
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    # Install the package in editable mode with all development dependencies
    # (includes all optional features like bertscore, cosine, jsd)
    uv pip install -e ".[dev]"
    

    If you prefer not to use uv, you can use pip:

    # Create a virtual environment (e.g., Python 3.10-3.12 recommended)
    python3 -m venv .venv
    # Activate the environment
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    # Install the package in editable mode with development extras
    pip install -e ".[dev]"
    

    (The dev extra installs GAICo with all its optional features, plus dependencies for testing, linting, building, and documentation.)

  3. Set up pre-commit hooks (optional but recommended for contributors):

    pre-commit install
    

Citation

If you find GAICo useful in your research or work, please consider citing it:

If you find this project useful, please consider citing it in your work:

@software{AI4Society_GAICo_GenAI_Results,
  author = {{Nitin Gupta, Pallav Koppisetti, Biplav Srivastava}},
  license = {MIT},
  title = {{GAICo: GenAI Results Comparator}},
  year = {2025},
  url = {https://github.com/ai4society/GenAIResultsComparator}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gaico-0.1.5.tar.gz (9.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gaico-0.1.5-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file gaico-0.1.5.tar.gz.

File metadata

  • Download URL: gaico-0.1.5.tar.gz
  • Upload date:
  • Size: 9.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.6

File hashes

Hashes for gaico-0.1.5.tar.gz
Algorithm Hash digest
SHA256 2a58e1fdc4213aa3c99cea5912ed6e7ee3ba7bf32ae5586863d0b0a6edafa5e0
MD5 b4cbf9d8c330b88db97ffb3aa015a163
BLAKE2b-256 65ecf6669839d55ffa688716672c24f50a298fca697b32daacef317756582811

See more details on using hashes here.

File details

Details for the file gaico-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: gaico-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.6

File hashes

Hashes for gaico-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 8446108116f6f07aadf4b49b1224896da444e3d08cc9fc26750e11f29938c851
MD5 f9e21a84a8135a14e326ba056d9a7142
BLAKE2b-256 056325f67c3f95aca5cd52ee29bef2dfc990a81eeb81e56551b3466fe04f15a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page