Skip to main content

A light-weight package for working with pre-trained word embeddings

Project description

A light-weight package for working with pre-trained word embeddings. Useful for input into neural networks, or for doing compositional semantics.

reach can read in word vectors in word2vec or glove format without any preprocessing.

The assumption behind reach is a no-hassle approach to featurization. The vectorization and bow approaches know how to deal with OOV words, removing these problems from your code.

Similarly, reach contains OOV and PAD vectors, removing the necessity of accounting for this in your own code.

reach also includes nearest neighbour calculation for arbitrary vectors, allowing you to experiment with compositional operators.

Example

from reach import Reach.

# Word2vec style: with header.
r = Reach("path/to/embeddings", header=True)

# Glove style: without header.
r = Reach("path/to/embeddings", header=False)

# Get vectors through indexing.
# Throws a KeyError is a word is not present.
vector = r['cat']

# Compare two words.
similarity = r.similarity('cat', 'dog')

# Find most similar.
similarities = r.most_similar('cat', 5)

sentence = 'a dog is the best creature alive'.split()
corpus = [sentence, sentence, sentence]

# bow representation, consistent with word vectors, for input into neural network.
bow = r.bow(sentence)

# vectorized representation.
vectorized = r.vectorize(sentence)

# can remove OOV words automatically.
vectorized = r.vectorize(sentence, remove_oov=True)

# vectorize corpus.
transformed = r.transform(corpus)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reach-0.0.1.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

reach-0.0.1-py2-none-any.whl (4.6 kB view details)

Uploaded Python 2

File details

Details for the file reach-0.0.1.tar.gz.

File metadata

  • Download URL: reach-0.0.1.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for reach-0.0.1.tar.gz
Algorithm Hash digest
SHA256 bddc7bc2c425cbbcb4ad52c6572b9377ee15b708ba17b380238f9d372e2720d5
MD5 cbc558a4f8eda535de44caedb36ccb7b
BLAKE2b-256 3cffd4043905a05b7336307e0184b84ef9878dd4efda16522b9783398e731f29

See more details on using hashes here.

File details

Details for the file reach-0.0.1-py2-none-any.whl.

File metadata

File hashes

Hashes for reach-0.0.1-py2-none-any.whl
Algorithm Hash digest
SHA256 7231cd8e72d04d40c601edb7f84a34039b88507f0e923e76f1cb6542cb1506af
MD5 c83aa786ff7cfc1381e6a1a7d757917c
BLAKE2b-256 4790e8f1ebb1dd1550245f462b3b2046a94a82c2d42f5df244d00dfac455329d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page