Skip to main content

Package for working with word vector embeddings

Project description

Word Vector Embedding Package

wordvecpy is a library for processing text data, tokenizing it, and building word vector dictionaries and whole word vector embeddings from the corpus text.

TextProcessor takes a corpus of unprocessed text and processes it for use with word vectors. Punctuation, stopwords, substitutions, contractions, and lemmatization can all be customized.

VectorDictionary loads pretrained word embeddings from .txt files so they can be used with other classes. Every class in this package that requires a vector dictionary can take a pymagnitude vector or a VectorDictionary object.

Vectokenizer and FastVectokenizer both convert processed text corpus into integer embeddings and create vector dictionaries for those associated integer embeddings. Both classes do the exact same thing but FastVectokenizer requires Keras to create integer embeddings and Vectokenizer does not.

EmbeddedCorpus and LoadEmbeddedCorpus generate (and save, if needed) and load complete word vector embeddings. As these can take up a huge amount of memory quickly, it is capable of splitting and saving in slices of data. This is most useful for using word vector embeddings in raw form.

ELMOEmbeddedCorpus does the exact same thing as EmbeddedCorpus, however, only for ELMO vectors. Due to embeddings being different depending on the sentence for ELMO embeddings, the method for converting these had to be changed. Embeddings are still loaded with LoadEmbeddedCorpus. ELMOEmbeddedCorpus is currently only available using pymagnitude to access ELMO embeddings.

Current version is 0.6.

Installation

Use the package manager pip to install wordvecpy.

pip install wordvecpy

Usage

Future Plans

Currently working on functionality to reduce the size of integer embeddings by clustering words based on their vector representations. This will hopefully allow smaller vector dictionaries to be used while maintaining good functionality.

License

None brah

wordvecpy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordvecpy-0.71.tar.gz (11.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page