A package for comparing trained embedding models.
Project description
repcomp
repcomp
(short for representation comparison) is a package for comparing trained embedding models. You can use it to compare Deep Neural Networks, Matrix Factorization models, Graph Embeddings, Word Embeddings, etc.
repcomp
supports the following embedding comparison approaches:
- Nearest Neighbors: Fetch the nearest neighbor set of each entity according to embedding distances, and compare model A's neighbor sets to model B's neighbor sets.
- Canonical Correlation: Treat embedding components as observations of random variables and compute the canonical correlations between model A and model B.
- Unit Match: Form a unit-to-unit matching between model A's embedding components and model B's embedding components and measure the correlations of the matched units.
A simple example comparing random embeddings:
from repcomp.comparison import CCAComparison
import numpy as np
# Generate random embedding matrices
num_samples = 100
num_components = 10
embedding_1 = np.random.random((num_samples, num_components))
embedding_2 = embedding_1 + 0.5 * np.random.random((num_samples, num_components))
# Run the comparison
comparator = CCAComparison()
sim = comparator.run_comparison(embedding_1, embedding_2)
print("The canonical correlation similarity is {}".format(sim["similarity"]))
A more involved example comparing word embeddings:
import gensim.downloader as api
import numpy as np
from repcomp.comparison import NeighborsComparison
# Load word vectors from gensim
glove_wiki_50 = api.load("glove-wiki-gigaword-50")
glove_twitter_50 = api.load("glove-twitter-50")
# Build the embedding matrices over the shared vocabularies
shared_vocab = set(glove_wiki_50.vocab.keys()).intersection(
set(glove_twitter_50.vocab.keys()))
glove_wiki_50_vectors = np.vstack([glove_wiki_50.get_vector(word) for word in shared_vocab])
glove_twitter_50_vectors = np.vstack([glove_twitter_50.get_vector(word) for word in shared_vocab])
# Run the comparison
comparator = NeighborsComparison()
print("The neighbors similarity between glove-wiki-gigaword-50 and glove-twitter-50 is {}".format(
comparator.run_comparison(glove_wiki_50_vectors, glove_twitter_50_vectors)["similarity"]))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
repcomp-0.1.tar.gz
(4.3 kB
view details)
Built Distribution
repcomp-0.1-py3-none-any.whl
(6.1 kB
view details)
File details
Details for the file repcomp-0.1.tar.gz
.
File metadata
- Download URL: repcomp-0.1.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.27.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d742e78699483b67ffdc7a82e000d076bab91ff956b65f64555dd1191c3356f |
|
MD5 | d2178aa3ad61934d9e9ea01d483dff36 |
|
BLAKE2b-256 | e0a88d89424ded043082ee90ef3cc7752d5f144f583f75ae17cd0a873d875a86 |
File details
Details for the file repcomp-0.1-py3-none-any.whl
.
File metadata
- Download URL: repcomp-0.1-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.27.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52184ecd2ffbdbef4587040a71d69ae1301caf8ab450ea357f6f96cfd4415808 |
|
MD5 | f301b48f4aa2e11e60bb2c86144c1aa8 |
|
BLAKE2b-256 | 04e16045f77dfb72bdf320b66c0bbf16dc4f35490807c5a63b062de79190329e |