Character-based word embeddings model based on RNN
Project description
Chars2vec library could be very useful if you are dealing with the texts containing abbreviations, slang, typos, or some other specific textual dataset. Chars2vec language model is based on the symbolic representation of words – the model maps each word to a vector of a fixed length. These vector representations are obtained with a custom neural netowrk while the latter is being trained on pairs of similar and non-similar words. This custom neural net includes LSTM, reading sequences of characters in words, as its part. The model maps similarly written words to proximal vectors. This approach enables creation of an embedding in vector space for any sequence of characters. Chars2vec models does not keep any dictionary of embeddings, but generates embedding vectors inplace using pretrained model. There are pretrained models of dimensions 50, 100, 150, 200 and 300 for the English language. The library provides convenient user API to train a model for an arbitrary set of characters.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file chars2vec-0.1.7.tar.gz
.
File metadata
- Download URL: chars2vec-0.1.7.tar.gz
- Upload date:
- Size: 8.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | df05b0765e406e1bdb9f9537adf7fccb24db4a81956aea5c25a707a03a6a8094 |
|
MD5 | d065305a0497dae89dd5e6f2d760eafd |
|
BLAKE2b-256 | 040a8c327aae23e0532d239ec7b30446aca765eb5d9547b4c4b09cdd82e49797 |