GenAI Results Comparator, GAICo, is a Python library to help compare, analyze and visualize outputs from Large Language Models (LLMs), often against a reference text. In doing so, one can use a range of extensible metrics from the literature.
Project description
GAICo: GenAI Results Comparator
Repository: github.com/ai4society/GenAIResultsComparator
Documentation: ai4society.github.io/projects/GenAIResultsComparator
Overview
GenAI Results Comparator, GAICo, is a Python library to help compare, analyze and visualize outputs from Large Language Models (LLMs), often against a reference text. In doing so, one can use a range of extensible metrics from the literature.
At the core, the library provides a set of metrics for evaluating text strings given as inputs and produce outputs on a scale of 0 to 1 (normalized), where 1 indicates a perfect match between the texts. These scores are use to analyze LLM outputs as well as visualize.
Quickstart
GAICo's Experiment class offers a streamlined workflow for comparing multiple model outputs, applying thresholds, generating plots, and creating CSV reports.
Here's a quick example:
from gaico import Experiment
# Sample data from https://arxiv.org/abs/2504.07995
llm_responses = {
"Google": "Title: Jimmy Kimmel Reacts to Donald Trump Winning the Presidential ... Snippet: Nov 6, 2024 ...",
"Mixtral 8x7b": "I'm an Al and I don't have the ability to predict the outcome of elections.",
"SafeChat": "Sorry, I am designed not to answer such a question.",
}
reference_answer = "Sorry, I am unable to answer such a question as it is not appropriate."
# 1. Initialize Experiment
exp = Experiment(
llm_responses=llm_responses,
reference_answer=reference_answer
)
# 2. Compare models using specific metrics
# This will calculate scores for 'Jaccard' and 'ROUGE',
# generate a plot (e.g., radar plot for multiple metrics/models),
# and save a CSV report.
results_df = exp.compare(
metrics=['Jaccard', 'ROUGE'], # Specify metrics, or None for all defaults
plot=True,
output_csv_path="experiment_report.csv",
custom_thresholds={"Jaccard": 0.6, "ROUGE_rouge1": 0.35} # Optional: override default thresholds
)
# The returned DataFrame contains the calculated scores
print("Scores DataFrame from compare():")
print(results_df)
For more detailed examples, please refer to our Jupyter Notebooks in the examples/ folder in the repository.
Features
- Implements various metrics for text comparison:
- N-gram-based metrics (BLEU, ROUGE, JS divergence)
- Text similarity metrics (Jaccard, Cosine, Levenshtein, Sequence Matcher)
- Semantic similarity metrics (BERTScore)
- Provides visualization capabilities using matplotlib and seaborn for plots like bar charts and radar plots.
- Allows exportation of results to CSV files, including scores and threshold pass/fail status.
- Provides streamlined
Experimentclass for easy comparison of multiple models, applying thresholds, plotting, and reporting. - Supports batch processing for efficient computation.
- Optimized for different input types (lists, numpy arrays, pandas Series).
- Has extendable architecture for easy addition of new metrics.
- Supports automated testing of metrics using Pytest.
Installation
GAICo can be installed using pip. We strongly recommend using a Python virtual environment to manage dependencies and avoid conflicts with other packages.
-
Create and activate a virtual environment (e.g., named
gaico-env):# For Python 3.10+ python3 -m venv gaico-env source gaico-env/bin/activate # On macOS/Linux # gaico-env\Scripts\activate # On Windows
-
Install GAICo: Once your virtual environment is active, install GAICo using pip:
pip install gaico
This installs the core GAICo library.
Using GAICo with Jupyter Notebooks/Lab
If you plan to use GAICo within Jupyter Notebooks or JupyterLab (recommended for exploring examples and interactive analysis), install them into the same activated virtual environment:
# (Ensure your 'gaico-env' is active)
pip install notebook # For Jupyter Notebook
# OR
# pip install jupyterlab # For JupyterLab
Then, launch Jupyter from the same terminal where your virtual environment is active:
# (Ensure your 'gaico-env' is active)
jupyter notebook
# OR
# jupyter lab
New notebooks created in this session should automatically use the gaico-env Python environment. For troubleshooting kernel issues, please see our FAQ document.
Optional Installations for GAICo
The default installation includes core metrics and is lightweight. For optional features and metrics that have larger dependencies:
- To include the BERTScore metric (which has larger dependencies like PyTorch):
pip install 'gaico[bertscore]'
- To include the CosineSimilarity metric (requires scikit-learn):
pip install 'gaico[cosine]'
- To include the JSDivergence metric (requires SciPy and NLTK):
pip install 'gaico[jsd]'
- To install with all optional features:
pip install 'gaico[bertscore,cosine,jsd]'
(Note: All optional features are also installed if you use thedevextra for development installs.)
Installation Size Comparison
The following table provides an estimated overview of the relative disk space impact of different installation options. Actual sizes may vary depending on your operating system, Python version, and existing packages. These are primarily to illustrate the relative impact of optional dependencies.
Note: Core dependencies include: levenshtein, matplotlib, numpy, pandas, rouge-score, and seaborn.
| Installation Command | Dependencies | Estimated Total Size Impact |
|---|---|---|
pip install gaico |
Core | 210 MB |
pip install 'gaico[jsd]' |
Core + scipy, nltk |
310 MB |
pip install 'gaico[cosine]' |
Core + scikit-learn |
360 MB |
pip install 'gaico[bertscore]' |
Core + bert-score (includes torch, transformers, etc.) |
800 MB |
pip install 'gaico[bertscore,cosine,jsd]' |
Core + all dependencies from above | 950 MB |
For Developers (Installing from source)
If you want to contribute to GAICo or install it from source for development:
-
Clone the repository:
git clone https://github.com/ai4society/GenAIResultsComparator.git cd GenAIResultsComparator
-
Set up a virtual environment and install dependencies: We recommend using UV for managing environments and dependencies.
# Create a virtual environment (e.g., Python 3.10-3.12 recommended) uv venv # Activate the environment source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install the package in editable mode with all development dependencies # (includes all optional features like bertscore, cosine, jsd) uv pip install -e ".[dev]"
If you prefer not to use
uv, you can usepip:# Create a virtual environment (e.g., Python 3.10-3.12 recommended) python3 -m venv .venv # Activate the environment source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install the package in editable mode with development extras pip install -e ".[dev]"
(The
devextra installs GAICo with all its optional features, plus dependencies for testing, linting, building, and documentation.) -
Set up pre-commit hooks (optional but recommended for contributors):
pre-commit install
Citation
If you find GAICo useful in your research or work, please consider citing it:
If you find this project useful, please consider citing it in your work:
@software{AI4Society_GAICo_GenAI_Results,
author = {{Nitin Gupta, Pallav Koppisetti, Biplav Srivastava}},
license = {MIT},
title = {{GAICo: GenAI Results Comparator}},
year = {2025},
url = {https://github.com/ai4society/GenAIResultsComparator}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gaico-0.1.5.tar.gz.
File metadata
- Download URL: gaico-0.1.5.tar.gz
- Upload date:
- Size: 9.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a58e1fdc4213aa3c99cea5912ed6e7ee3ba7bf32ae5586863d0b0a6edafa5e0
|
|
| MD5 |
b4cbf9d8c330b88db97ffb3aa015a163
|
|
| BLAKE2b-256 |
65ecf6669839d55ffa688716672c24f50a298fca697b32daacef317756582811
|
File details
Details for the file gaico-0.1.5-py3-none-any.whl.
File metadata
- Download URL: gaico-0.1.5-py3-none-any.whl
- Upload date:
- Size: 29.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8446108116f6f07aadf4b49b1224896da444e3d08cc9fc26750e11f29938c851
|
|
| MD5 |
f9e21a84a8135a14e326ba056d9a7142
|
|
| BLAKE2b-256 |
056325f67c3f95aca5cd52ee29bef2dfc990a81eeb81e56551b3466fe04f15a0
|