Skip to main content

A word hashing method based on vectors of letter n-grams. Currently transforms text into sequences of numbers.

Project description


> A word hashing method to reduce the dimensionality of the bag-of-words term vectors. It is based on letter n-gram. Given a word (e.g. good), it first adds word starting and ending marks to the word (e.g. #good#). Then, breaks the word into letter n-grams (e.g. letter trigrams: #go, goo, ood, od#). Finally, the word is represented using a vector of letter n-grams.

[Huang et al.2013, Learning Deep Structured Semantic Models for Web Search using Clickthrough Data]


This implementation supports the transformation from **text into sequences of numbers**, with the numbers indicating the descending word frequency.

For example:

*Lorem ipsum dolor sit amet, consectetuer adipiscing elit...* is transformed into *23, 1, 80, 86, 47, 50001, 21, 59, 83, 93, 14, 50003, 4, 7*

Also, after each word flags indicating lower case, upper case, mixed case or initial capitalization are added.

### To do

There will be an implementation supporting the transformation from **text into bag-of-word vectors**.


pip install l3wtransformer


from l3wtransformer import L3wTransformer

l3wt = L3wTransformer()

l3wt.fit_on_texts(['First example.', 'And one more!'])
l3wt.texts_to_sequences(['One example', '2nd exa.'])

# [[5, 18, 17, 50001, 2, 10, 24, 6, 15, 20, 50003], [16, 50003, 2, 10, 50003]]

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for l3wtransformer, version 0.3.0
Filename, size File type Python version Upload date Hashes
Filename, size l3wtransformer-0.3.0-py2.py3-none-any.whl (4.6 kB) File type Wheel Python version py2.py3 Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page