Compute Word Mover's Distance using any type of Word Embedding model
Project description
Word Mover's Distance
In this package you will find the implementation of Word Mover's Distance for a generic Word Embeddings model.
I largely reused code available in the gensim library, in particular the wmdistance function, making it more general so that it can be used with other Word Embeddings models, such as GloVe.
You can find a real-world usage of this package in my news summariser repository, where I use Word Mover's distance for finding the most similar sentences in a given news article.
Basic usage
Import the library:
import word_embedding.model as model
Initialise a Word Embedding object
You can pass the path where the model is stored:
model = model.WordEmbedding(model_fn="/path/where/my/model/is/stored.txt")
or you can pass the model itself, previously loaded (assuming your model is a dictionary, whose keys are the various words and its values the vector representation of the various words):
model = model.WordEmbedding(model=my_word_embedding_model)
Compute Word Mover's distance
s1 = 'Obama speaks to the media in Chicago'.lower().split()
s2 = 'The president spoke to the press in Chicago'.lower().split()
wmdistance = model.wmdistance(s1, s2)
1.8119693993679309
Remember that the wmdistance(s1, s2)
method expects two List[str]
as input!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for word_mover_distance-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0af799615efad01cbce6feb2d9288564ea34614e4176d842b57cd69d0b43e32c |
|
MD5 | 653e88e3d664275230e6a2432040092f |
|
BLAKE2b-256 | bed7f606dc274ce6f02e053996e5f293d3f643a8a739727c7892a148b55946e2 |
Hashes for word_mover_distance-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9923e1c3572a90f910bdc5c18cefe2925ea963568b3df75f170d7c5fcbd0c1af |
|
MD5 | fa3a78cf1f5a1b7057d6154bcf434f14 |
|
BLAKE2b-256 | 9ace8c1b16368d7b32fc9cdcc6aade3eabac5a7176774ff7e3fdaac687d1583c |