Skip to main content

Character-based word embeddings model based on RNN

Project description

Chars2vec library could be very useful if you are dealing with the texts containing abbreviations, slang, typos, or some other specific textual dataset. Chars2vec language model is based on the symbolic representation of words – the model maps each word to a vector of a fixed length. These vector representations are obtained with a custom neural netowrk while the latter is being trained on pairs of similar and non-similar words. This custom neural net includes LSTM, reading sequences of characters in words, as its part. The model maps similarly written words to proximal vectors. This approach enables creation of an embedding in vector space for any sequence of characters. Chars2vec models does not keep any dictionary of embeddings, but generates embedding vectors inplace using pretrained model. There are pretrained models of dimensions 50, 100, 150, 200 and 300 for the English language. The library provides convenient user API to train a model for an arbitrary set of characters.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for chars2vec, version 0.1.7
Filename, size File type Python version Upload date Hashes
Filename, size chars2vec-0.1.7.tar.gz (8.1 MB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page