A light-weight package for working with pre-trained word embeddings
Project description
A light-weight package for working with pre-trained word embeddings. Useful for input into neural networks, or for doing compositional semantics.
reach can read in word vectors in word2vec or glove format without any preprocessing.
The assumption behind reach is a no-hassle approach to featurization. The vectorization and bow approaches know how to deal with OOV words, removing these problems from your code.
Similarly, reach contains OOV and PAD vectors, removing the necessity of accounting for this in your own code.
reach also includes nearest neighbour calculation for arbitrary vectors, allowing you to experiment with compositional operators.
Example
from reach import Reach.
# Word2vec style: with header.
r = Reach("path/to/embeddings", header=True)
# Glove style: without header.
r = Reach("path/to/embeddings", header=False)
# Get vectors through indexing.
# Throws a KeyError is a word is not present.
vector = r['cat']
# Compare two words.
similarity = r.similarity('cat', 'dog')
# Find most similar.
similarities = r.most_similar('cat', 5)
sentence = 'a dog is the best creature alive'.split()
corpus = [sentence, sentence, sentence]
# bow representation, consistent with word vectors, for input into neural network.
bow = r.bow(sentence)
# vectorized representation.
vectorized = r.vectorize(sentence)
# can remove OOV words automatically.
vectorized = r.vectorize(sentence, remove_oov=True)
# vectorize corpus.
transformed = r.transform(corpus)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file reach-0.0.3.tar.gz
.
File metadata
- Download URL: reach-0.0.3.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3dfb2e0b67ba06ff422c63ca75a823f4aeb8deeb906c96560f773bf8eabc50e |
|
MD5 | 7e91002edcf19df08176c2d830a65c39 |
|
BLAKE2b-256 | 82e3e4316df135fcaa3c5427d6df41d48d8085d9ece7ec1d94d0064f70685dfd |
File details
Details for the file reach-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: reach-0.0.3-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56d593c2abe83df46a39ed6aa6a8ed7d7021271e2af488e138c37786ddb515df |
|
MD5 | ba447a462f21a04a645b8c2ba9bc93e2 |
|
BLAKE2b-256 | cc08c24bc6406200852e07782b36d4c9e8fa3abfcef5eb9e61055dec1b942c34 |
File details
Details for the file reach-0.0.3-py2-none-any.whl
.
File metadata
- Download URL: reach-0.0.3-py2-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 2
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b832072259f8c9fc6d9b5bf0e670e526975015ed713cad95b1edbb5791b7846 |
|
MD5 | 51997cfb7e1426919051b9da2d8d9873 |
|
BLAKE2b-256 | 4cd4a09ca2717746d1fda64a0e827157006b26a0f3ed65891f8405d9fead06c3 |