Skip to main content

A light-weight package for working with pre-trained word embeddings

Project description

A light-weight package for working with pre-trained word embeddings. Useful for input into neural networks, or for doing compositional semantics.

reach can read in word vectors in word2vec or glove format without any preprocessing.

The assumption behind reach is a no-hassle approach to featurization. The vectorization and bow approaches know how to deal with OOV words, removing these problems from your code.

Similarly, reach contains OOV and PAD vectors, removing the necessity of accounting for this in your own code.

reach also includes nearest neighbour calculation for arbitrary vectors, allowing you to experiment with compositional operators.

Example

from reach import Reach.

# Word2vec style: with header.
r = Reach("path/to/embeddings", header=True)

# Glove style: without header.
r = Reach("path/to/embeddings", header=False)

# Get vectors through indexing.
# Throws a KeyError is a word is not present.
vector = r['cat']

# Compare two words.
similarity = r.similarity('cat', 'dog')

# Find most similar.
similarities = r.most_similar('cat', 5)

sentence = 'a dog is the best creature alive'.split()
corpus = [sentence, sentence, sentence]

# bow representation, consistent with word vectors, for input into neural network.
bow = r.bow(sentence)

# vectorized representation.
vectorized = r.vectorize(sentence)

# can remove OOV words automatically.
vectorized = r.vectorize(sentence, remove_oov=True)

# vectorize corpus.
transformed = r.transform(corpus)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reach-0.0.3.tar.gz (3.9 kB view details)

Uploaded Source

Built Distributions

reach-0.0.3-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

reach-0.0.3-py2-none-any.whl (4.6 kB view details)

Uploaded Python 2

File details

Details for the file reach-0.0.3.tar.gz.

File metadata

  • Download URL: reach-0.0.3.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for reach-0.0.3.tar.gz
Algorithm Hash digest
SHA256 f3dfb2e0b67ba06ff422c63ca75a823f4aeb8deeb906c96560f773bf8eabc50e
MD5 7e91002edcf19df08176c2d830a65c39
BLAKE2b-256 82e3e4316df135fcaa3c5427d6df41d48d8085d9ece7ec1d94d0064f70685dfd

See more details on using hashes here.

File details

Details for the file reach-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for reach-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 56d593c2abe83df46a39ed6aa6a8ed7d7021271e2af488e138c37786ddb515df
MD5 ba447a462f21a04a645b8c2ba9bc93e2
BLAKE2b-256 cc08c24bc6406200852e07782b36d4c9e8fa3abfcef5eb9e61055dec1b942c34

See more details on using hashes here.

File details

Details for the file reach-0.0.3-py2-none-any.whl.

File metadata

File hashes

Hashes for reach-0.0.3-py2-none-any.whl
Algorithm Hash digest
SHA256 4b832072259f8c9fc6d9b5bf0e670e526975015ed713cad95b1edbb5791b7846
MD5 51997cfb7e1426919051b9da2d8d9873
BLAKE2b-256 4cd4a09ca2717746d1fda64a0e827157006b26a0f3ed65891f8405d9fead06c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page