Skip to main content

Tool for analyzing and comparing embedding models through pairwise cosine similarity distributions

Project description

Embeddings Evaluator

A Python package for analyzing and comparing embedding models through pairwise cosine similarity distributions.

Features

  • Pairwise cosine similarity distribution analysis
  • Statistical measures:
    • Mean (μ)
    • Standard deviation (σ)
    • Median (m)
    • Peak location and amplitude
  • Multi-model comparison visualization

Installation

pip install -r requirements.txt

Usage

import numpy as np
from embeddings_evaluator import plot_model_comparison
from embeddings_evaluator.comparison import save_comparison_plot

# Load your embeddings into a dictionary
embeddings_dict = {
    "Model A": embeddings_a,  # numpy array of shape (n_docs, embedding_dim)
    "Model B": embeddings_b
}

# Generate comparison plot
fig = plot_model_comparison(embeddings_dict)
save_comparison_plot(fig, 'comparison.png')

Example with Faiss Indices

import faiss
import numpy as np
from embeddings_evaluator import plot_model_comparison

# Load embeddings from faiss indices
def load_faiss_embeddings(index_path):
    index = faiss.read_index(index_path)
    if isinstance(index, faiss.IndexFlatL2):
        num_vectors = index.ntotal
        dimension = index.d
        embeddings = np.zeros((num_vectors, dimension), dtype=np.float32)
        for i in range(num_vectors):
            embeddings[i] = index.reconstruct(i)
        return embeddings
    raise ValueError("Unsupported index type")

# Load multiple models
embeddings_dict = {}
for size in [250, 500, 1000, 2000, 4000]:
    embeddings = load_faiss_embeddings(f"faiss_embeddings/{size}/index.faiss")
    # Normalize for cosine similarity
    embeddings = embeddings / np.linalg.norm(embeddings, axis=1)[:, np.newaxis]
    embeddings_dict[f"Model {size}"] = embeddings

# Generate visualization
fig = plot_model_comparison(embeddings_dict)
save_comparison_plot(fig, 'model_comparison.png')

Output

The tool provides:

  1. Statistical Measures for each model:
  • Mean cosine similarity (μ)
  • Standard deviation (σ)
  • Median (m)
  • Peak location and amplitude
  1. Visualization:
  • Overlaid probability density histograms
  • Statistical annotations
  • Peak coordinates
  • Vertical lines at mean values
  • [0,1] bounded cosine similarity range

Requirements

  • numpy
  • pandas
  • plotly
  • scipy
  • faiss-cpu (for faiss index support)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embeddings_evaluator-1.0.0.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

embeddings_evaluator-1.0.0-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file embeddings_evaluator-1.0.0.tar.gz.

File metadata

  • Download URL: embeddings_evaluator-1.0.0.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for embeddings_evaluator-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3222163f40b06b8c13284a48d6fb6a0e1a374c2992a3a948da391aac6ca1c01e
MD5 6ce77820ac8e1afb449a01a3e717a076
BLAKE2b-256 b6a3784a3accb865d0288ca9b4618dc9f630cb6212f448e76092bc037d076df3

See more details on using hashes here.

File details

Details for the file embeddings_evaluator-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for embeddings_evaluator-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f0ab83a1de09b8c7720eddc503dcc2820bbe031cd40776d97a9ce7f13c861838
MD5 79d34843742eec0b6fd89204369a99c3
BLAKE2b-256 0806b8948bc7f7fe7e6505a2e6e3b39d37fd3eaa9c1e26900659ca7308a15c67

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page