A package for comparing trained embedding models.
Project description
repcomp
repcomp
(short for representation comparison) is a package for comparing trained embedding models. You can use it to compare Deep Neural Networks, Matrix Factorization models, Graph Embeddings, Word Embeddings, etc.
repcomp
supports the following embedding comparison approaches:
- Nearest Neighbors: Fetch the nearest neighbor set of each entity according to embedding distances, and compare model A's neighbor sets to model B's neighbor sets.
- Canonical Correlation: Treat embedding components as observations of random variables and compute the canonical correlations between model A and model B.
- Unit Match: Form a unit-to-unit matching between model A's embedding components and model B's embedding components and measure the correlations of the matched units.
A simple example comparing random embeddings:
from repcomp.comparison import CCAComparison
import numpy as np
# Generate random embedding matrices
num_samples = 100
num_components = 10
embedding_1 = np.random.random((num_samples, num_components))
embedding_2 = embedding_1 + 0.5 * np.random.random((num_samples, num_components))
# Run the comparison
comparator = CCAComparison()
sim = comparator.run_comparison(embedding_1, embedding_2)
print("The canonical correlation similarity is {}".format(sim["similarity"]))
A more involved example comparing word embeddings:
import gensim.downloader as api
import numpy as np
from repcomp.comparison import NeighborsComparison
# Load word vectors from gensim
glove_wiki_50 = api.load("glove-wiki-gigaword-50")
glove_twitter_50 = api.load("glove-twitter-50")
# Build the embedding matrices over the shared vocabularies
shared_vocab = set(glove_wiki_50.vocab.keys()).intersection(
set(glove_twitter_50.vocab.keys()))
glove_wiki_50_vectors = np.vstack([glove_wiki_50.get_vector(word) for word in shared_vocab])
glove_twitter_50_vectors = np.vstack([glove_twitter_50.get_vector(word) for word in shared_vocab])
# Run the comparison
comparator = NeighborsComparison()
print("The neighbors similarity between glove-wiki-gigaword-50 and glove-twitter-50 is {}".format(
comparator.run_comparison(glove_wiki_50_vectors, glove_twitter_50_vectors)["similarity"]))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
repcomp-0.1.tar.gz
(4.3 kB
view hashes)
Built Distribution
repcomp-0.1-py3-none-any.whl
(6.1 kB
view hashes)