Skip to main content

Character-based word embeddings model based on RNN

Project description

Chars2vec library could be very useful if you are dealing with the texts containing abbreviations, slang, typos, or some other specific textual dataset. Chars2vec language model is based on the symbolic representation of words – the model maps each word to a vector of a fixed length. These vector representations are obtained with a custom neural netowrk while the latter is being trained on pairs of similar and non-similar words. This custom neural net includes LSTM, reading sequences of characters in words, as its part. The model maps similarly written words to proximal vectors. This approach enables creation of an embedding in vector space for any sequence of characters. Chars2vec models does not keep any dictionary of embeddings, but generates embedding vectors inplace using pretrained model. There are pretrained models of dimensions 50, 100, 150, 200 and 300 for the English language. The library provides convenient user API to train a model for an arbitrary set of characters.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chars2vec-0.1.7.tar.gz (8.1 MB view details)

Uploaded Source

File details

Details for the file chars2vec-0.1.7.tar.gz.

File metadata

  • Download URL: chars2vec-0.1.7.tar.gz
  • Upload date:
  • Size: 8.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6

File hashes

Hashes for chars2vec-0.1.7.tar.gz
Algorithm Hash digest
SHA256 df05b0765e406e1bdb9f9537adf7fccb24db4a81956aea5c25a707a03a6a8094
MD5 d065305a0497dae89dd5e6f2d760eafd
BLAKE2b-256 040a8c327aae23e0532d239ec7b30446aca765eb5d9547b4c4b09cdd82e49797

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page