Tool for analyzing and comparing embedding models through pairwise cosine similarity distributions
Project description
Embeddings Evaluator
A Python package for analyzing and comparing embedding models through pairwise cosine similarity distributions.
Features
- Pairwise cosine similarity distribution analysis
- Statistical measures:
- Mean (μ)
- Standard deviation (σ)
- Median (m)
- Peak location and amplitude
- Multi-model comparison visualization
Installation
pip install -r requirements.txt
Usage
import numpy as np
from embeddings_evaluator import plot_model_comparison
from embeddings_evaluator.comparison import save_comparison_plot
# Load your embeddings into a dictionary
embeddings_dict = {
"Model A": embeddings_a, # numpy array of shape (n_docs, embedding_dim)
"Model B": embeddings_b
}
# Generate comparison plot
fig = plot_model_comparison(embeddings_dict)
save_comparison_plot(fig, 'comparison.png')
Example with Faiss Indices
import faiss
import numpy as np
from embeddings_evaluator import plot_model_comparison
# Load embeddings from faiss indices
def load_faiss_embeddings(index_path):
index = faiss.read_index(index_path)
if isinstance(index, faiss.IndexFlatL2):
num_vectors = index.ntotal
dimension = index.d
embeddings = np.zeros((num_vectors, dimension), dtype=np.float32)
for i in range(num_vectors):
embeddings[i] = index.reconstruct(i)
return embeddings
raise ValueError("Unsupported index type")
# Load multiple models
embeddings_dict = {}
for size in [250, 500, 1000, 2000, 4000]:
embeddings = load_faiss_embeddings(f"faiss_embeddings/{size}/index.faiss")
# Normalize for cosine similarity
embeddings = embeddings / np.linalg.norm(embeddings, axis=1)[:, np.newaxis]
embeddings_dict[f"Model {size}"] = embeddings
# Generate visualization
fig = plot_model_comparison(embeddings_dict)
save_comparison_plot(fig, 'model_comparison.png')
Output
The tool provides:
- Statistical Measures for each model:
- Mean cosine similarity (μ)
- Standard deviation (σ)
- Median (m)
- Peak location and amplitude
- Visualization:
- Overlaid probability density histograms
- Statistical annotations
- Peak coordinates
- Vertical lines at mean values
- [0,1] bounded cosine similarity range
Requirements
- numpy
- pandas
- plotly
- scipy
- faiss-cpu (for faiss index support)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file embeddings_evaluator-1.0.0.tar.gz
.
File metadata
- Download URL: embeddings_evaluator-1.0.0.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3222163f40b06b8c13284a48d6fb6a0e1a374c2992a3a948da391aac6ca1c01e |
|
MD5 | 6ce77820ac8e1afb449a01a3e717a076 |
|
BLAKE2b-256 | b6a3784a3accb865d0288ca9b4618dc9f630cb6212f448e76092bc037d076df3 |
File details
Details for the file embeddings_evaluator-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: embeddings_evaluator-1.0.0-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0ab83a1de09b8c7720eddc503dcc2820bbe031cd40776d97a9ce7f13c861838 |
|
MD5 | 79d34843742eec0b6fd89204369a99c3 |
|
BLAKE2b-256 | 0806b8948bc7f7fe7e6505a2e6e3b39d37fd3eaa9c1e26900659ca7308a15c67 |