A package for evaluating and comparing text embeddings
Project description
Embeddings Evaluator
Embeddings Evaluator is a Python package designed to help users evaluate and compare embeddings of text documents using various numeric metrics. It is particularly useful for tasks involving information retrieval and Retrieval-Augmented Generation (RAG). The package provides an automated way to assess the quality of embeddings and compare multiple embeddings based on key metrics.
Features
- Mean Pairwise Distance: Measures the average distance between all pairs of embeddings.
- Variance of Pairwise Distance: Indicates the spread or variability in the distances between pairs of embeddings.
- Mean Cosine Similarity: Assesses the average cosine similarity between all pairs of embeddings, indicating how similar the embeddings are to each other.
- Variance of Cosine Similarity: Provides insight into the variability of cosine similarities, which can indicate clustering tendencies.
- Entropy of Embedding Distribution: Evaluates the diversity of the embeddings in the vector space.
Installation
To install the package, use the following command:
pip install embeddings_evaluator
Usage
Here's how you can use the Embeddings Evaluator
package to compare and plot metrics for multiple embeddings:
import numpy as np
from embeddings_evaluator import compare_embeddings, plot_metrics
# Example embeddings
embeddings1 = np.random.rand(100, 300)
embeddings2 = np.random.rand(100, 300)
embeddings3 = np.random.rand(100, 300)
embeddings4 = np.random.rand(100, 300)
embeddings5 = np.random.rand(100, 300)
# List of embeddings and corresponding labels
embeddings_list = [embeddings1, embeddings2, embeddings3, embeddings4, embeddings5]
labels = ['250', '500', '1000', '2000', '4000']
# Generate the comparison DataFrame
df = compare_embeddings(embeddings_list, labels)
# Display the DataFrame
print(df)
# Plot all metrics
plot_metrics(df)
How It Helps
The Embeddings Evaluator package provides a simple and effective way to quantitatively assess and compare different embeddings. By using this package, you can:
- Understand the distribution of your embeddings in the vector space.
- Identify which embeddings are most distinct and which are more similar.
- Compare different embeddings across a range of metrics to determine the best option for your specific retrieval or RAG tasks.
The automated nature of these evaluations means you can quickly gain insights without manual intervention, making it an ideal tool for embedding evaluation workflows.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for embeddings_evaluator-0.1.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8d176265b5e3fea9f8082ac208b83298f9d44bc3754e233b776b68168580a8a |
|
MD5 | 8135a9f5ab458adc37462396bb600749 |
|
BLAKE2b-256 | 964f372b223405022e45f0fd78a0278198606088150a06871b5772be70244e51 |
Hashes for embeddings_evaluator-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11d19fe1a31c51c74e40441a72db23fbf60ba6861ec8ca68aec3d03131bf5ccc |
|
MD5 | 2317be3e916040524ad4ef197135508f |
|
BLAKE2b-256 | 5e76b30f71047daabfb72ffbd8292b9ee4d8f3fa0f80172b27237ad5b0d3ca83 |