Skip to main content

A light-weight package for working with pre-trained word embeddings

Project description

A light-weight package for working with pre-trained word embeddings. Useful for input into neural networks, or for doing compositional semantics.

reach can read in word vectors in word2vec or glove format without any preprocessing.

The assumption behind reach is a no-hassle approach to featurization. The vectorization and bow approaches know how to deal with OOV words, removing these problems from your code.

Similarly, reach contains OOV and PAD vectors, removing the necessity of accounting for this in your own code.

reach also includes nearest neighbour calculation for arbitrary vectors, allowing you to experiment with compositional operators.

Example

from reach import Reach.

# Word2vec style: with header.
r = Reach("path/to/embeddings", header=True)

# Glove style: without header.
r = Reach("path/to/embeddings", header=False)

# Get vectors through indexing.
# Throws a KeyError is a word is not present.
vector = r['cat']

# Compare two words.
similarity = r.similarity('cat', 'dog')

# Find most similar.
similarities = r.most_similar('cat', 5)

sentence = 'a dog is the best creature alive'.split()
corpus = [sentence, sentence, sentence]

# bow representation, consistent with word vectors, for input into neural network.
bow = r.bow(sentence)

# vectorized representation.
vectorized = r.vectorize(sentence)

# can remove OOV words automatically.
vectorized = r.vectorize(sentence, remove_oov=True)

# vectorize corpus.
transformed = r.transform(corpus)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reach-0.0.4.tar.gz (4.5 kB view details)

Uploaded Source

Built Distributions

reach-0.0.4-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

reach-0.0.4-py2-none-any.whl (5.3 kB view details)

Uploaded Python 2

File details

Details for the file reach-0.0.4.tar.gz.

File metadata

  • Download URL: reach-0.0.4.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for reach-0.0.4.tar.gz
Algorithm Hash digest
SHA256 e081c025f2595a0bef85e835e4b9734229893b61afafb998827f4c68a0f5519e
MD5 80c3cda1a3c9258d6d4a41b05c2d800b
BLAKE2b-256 ac6e12118ed374b166c637501985bed5e19d78d39717e30ce4e37122b4af24a4

See more details on using hashes here.

File details

Details for the file reach-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for reach-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 fba5664e7a904efbe7f555d264e522fb832a3931d1a5c66fa81f7006c29ced7e
MD5 6e0463ef9a520d6cbc106ced641fa89a
BLAKE2b-256 92e34905d3ff4d301fe5956159df223a2333140c3d9579b6204453ed043c7838

See more details on using hashes here.

File details

Details for the file reach-0.0.4-py2-none-any.whl.

File metadata

File hashes

Hashes for reach-0.0.4-py2-none-any.whl
Algorithm Hash digest
SHA256 39b26ec3145e240da672ab4dabae2f941aff690b4db6c8362d1e2b378e5a00e2
MD5 3fa23416f420fd509e8e2004356cb2d1
BLAKE2b-256 c8a8d423d5c3cc45be1d165d69ef0b552fa1bc88eafe4441a26db51a6bcb5faf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page