GenAI Results Comparator, GAICo, is a Python library to help compare, analyze and visualize outputs from Large Language Models (LLMs), often against a reference text. In doing so, one can use a range of extensible metrics from the literature.
Project description
GAICo: GenAI Results Comparator
Repository: github.com/ai4society/GenAIResultsComparator
Documentation: ai4society.github.io/projects/GenAIResultsComparator
Overview
GenAI Results Comparator (GAICo) helps you measure the quality of your Generative AI (LLM) outputs. It enables you to compare, analyze, and visualize results across text, images, audio, and structured data, helping you answer the question: "Which model performed better?"
At its core, the library provides a set of metrics for evaluating various types of outputs, from plain text strings to structured data like planning sequences and time-series, and multimedia content such as images and audio. While the Experiment class streamlines evaluation for text-based and structured string outputs, individual metric classes offer direct control for all data types, including binary or array-based multimedia. These metrics produce normalized scores (typically 0 to 1), where 1 indicates a perfect match, enabling robust analysis and visualization of LLM performance.
Quickstart
GAICo's Experiment class offers a streamlined workflow for comparing multiple model outputs, applying thresholds, generating plots, and creating CSV reports.
Here's a quick example:
from gaico import Experiment
# Sample LLM responses comparing different models
llm_responses = {
"Google": "Title: Jimmy Kimmel Reacts to Donald Trump Winning...",
"Mixtral 8x7b": "I'm an AI and I don't have the ability to predict...",
"SafeChat": "Sorry, I am designed not to answer such a question.",
}
reference_answer = "Sorry, I am unable to answer such a question as it is not appropriate."
# Initialize and run comparison
exp = Experiment(llm_responses=llm_responses, reference_answer=reference_answer)
results = exp.compare(
metrics=['Jaccard', 'ROUGE'],
plot=True,
output_csv_path="experiment_report.csv"
)
print(results)
For more detailed examples, please refer to our Jupyter Notebooks in the examples/ folder in the repository.
Features
-
Comprehensive Metric Library
- Textual similarity: Jaccard, Cosine, Levenshtein, Sequence Matcher
- N-gram based: BLEU, ROUGE, JS Divergence
- Semantic similarity: BERTScore
- Structured data: Planning sequences and time-series metrics
- Multimedia: Image similarity (SSIM, hash-based) and audio quality metrics
-
Streamlined Evaluation Workflow
- High-level
Experimentclass for comparing models, applying thresholds, and generating reports summarize()method for aggregated performance overviews
- High-level
-
Dynamic & Extensible
- Register custom metrics at runtime
- Add your own evaluation criteria easily
-
Powerful Visualization
- Generate comparative plots automatically
- Support for bar charts and radar plots
-
Robust & Tested
- Comprehensive test suite with Pytest
- Production-ready reliability
Installation
GAICo can be installed using pip.
Create and activate a virtual environment:
python3 -m venv gaico-env
source gaico-env/bin/activate # On macOS/Linux
# gaico-env\Scripts\activate # On Windows
Install GAICo:
pip install gaico
This installs the core GAICo library with essential metrics.
Optional dependencies for specialized metrics:
pip install 'gaico[audio]' # Audio metrics
pip install 'gaico[bertscore]' # BERTScore metric
pip install 'gaico[cosine]' # Cosine similarity
pip install 'gaico[jsd]' # JS Divergence
pip install 'gaico[audio,bertscore,cosine,jsd]' # All features
Citation
If you find GAICo useful in your research or work, please consider citing it:
If you find this project useful, please cite our work:
@article{Gupta_Koppisetti_Lakkaraju_Srivastava_2026,
title={GAICo: A Deployed and Extensible Framework for Evaluating Diverse and Multimodal Generative AI Outputs},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
author={Gupta, Nitin and Koppisetti, Pallav and Lakkaraju, Kausik and Srivastava, Biplav},
year={2026},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gaico-0.4.0.tar.gz.
File metadata
- Download URL: gaico-0.4.0.tar.gz
- Upload date:
- Size: 86.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
803ba37e98b652d5e78a4262c087e2adfcc48ad31b2e218d4168c2f7eaff4651
|
|
| MD5 |
e76e4bf9a81cc8e75629c1006b73e2f6
|
|
| BLAKE2b-256 |
b33bb56b53f4a1bdd3e65e477e5f71bbc810ec647fea0cd875d221ded0d08d65
|
File details
Details for the file gaico-0.4.0-py3-none-any.whl.
File metadata
- Download URL: gaico-0.4.0-py3-none-any.whl
- Upload date:
- Size: 49.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41f47780eb32a37a21112b2be55f3b271861799c4919db8b4441acc6efabe222
|
|
| MD5 |
aad3d19be5dec110a66d36fdbc0bb261
|
|
| BLAKE2b-256 |
3a9ee473c1544bd8129e281bdf2388aac4f750e9d44a466afe638f036018845c
|