Consistency-focused similarity comparison framework for generative large language models
Project description
ConSCompF: Consistency-focused Similarity Comparison Framework
Python implementation of ConSCompF - LLM similarity comparison framework that accounts for instruction consistency proposed in the original paper.
Features
- Generates LLM similarity matrices and compresses them using PCA.
- Can be used in few-shot scenarios.
- Supports multiple input formats including lists, HF datasets, and pandas DataFrames.
- Supports different return types including lists, PyTorch tensors, and pandas DataFrames.
- Supports embedding caching.
Installation
Currently, there is no package available on pip. You can build and install it manually:
git clone https://github.com/alex-karev/conscompf
cd conscompf
python -m build .
pip install .
Usage
from conscompf import ConSCompF
conscompf = ConSCompF(quiet=True)
data: list[dict[str, list[str]]] = [
{
"model1": [
"Text 1...",
"Text 2...",
],
"model2": [
"Text 1...",
"Text 2...",
],
}, {
"model1": [...],
"model2": [...]
}, ...
] # Or use HF dataset with a similar structure
out = conscompf(data, return_type="df") # Available return types: pt, df, list
print(out["sim_matrix"])
print(out["pca"])
print(out["consistency"])
The same minimalistic example, but with real data can be found in examples/simple.py.
More examples are available in examples directory.
For a full list of available functions and arguments use the documentation:
pydoc conscompf.ConSCompF
Citation
This project is currently contributed by Alexey Karev and Dong Xu from School of Computer Engineering and Science of Shanghai University.
If you find our work valuable, please cite:
@article{
Karev_Xu_2025,
title={ConSCompF: Consistency-focused Similarity Comparison Framework for Generative Large Language Models},
volume={82},
ISSN={1076-9757},
DOI={10.1613/jair.1.17028},
journal={Journal of Artificial Intelligence Research},
author={Karev, Alexey and Xu, Dong},
year={2025},
month=mar,
pages={1325–1347}
}
The original dataset used during the experiments described in the original paper is available here.
Contribution
Feel free to fork this repo and make pull requests.
Lisense
Free to use under Apacha 2.0. See LICENSE for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file conscompf-0.2.tar.gz.
File metadata
- Download URL: conscompf-0.2.tar.gz
- Upload date:
- Size: 94.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aff27a34a7244b68a9e388f4c3575616f396287812f5e89f7468643ae6c6c560
|
|
| MD5 |
d8a516d9eff2d5799c3c922e93fcbdd5
|
|
| BLAKE2b-256 |
9f2d640bb786d0b8c63d20c9e6571d3e07eba60458ddaf3b4d0a2bf0ea638eec
|
File details
Details for the file conscompf-0.2-py3-none-any.whl.
File metadata
- Download URL: conscompf-0.2-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea82a61d61d7f0ebc9cc6927077fb6e34048ab2ebdc0383f985b1eba020c06e2
|
|
| MD5 |
6c35a2e2af28119091dd885ea094e98e
|
|
| BLAKE2b-256 |
66f934c48329fcc77f603d0bb7ef6c288e76608ea7f793be66da038c8b980b8b
|