Skip to main content

A word hashing method based on vectors of letter n-grams. Currently transforms text into sequences of numbers.

Project description

l3wtransformer
==============

> A word hashing method to reduce the dimensionality of the bag-of-words term vectors. It is based on letter n-gram. Given a word (e.g. good), it first adds word starting and ending marks to the word (e.g. #good#). Then, breaks the word into letter n-grams (e.g. letter trigrams: #go, goo, ood, od#). Finally, the word is represented using a vector of letter n-grams.

[Huang et al.2013, Learning Deep Structured Semantic Models for Web Search using Clickthrough Data]

---

This implementation supports the transformation from **text into sequences of numbers**, with the numbers indicating the descending word frequency.

For example:

*Lorem ipsum dolor sit amet, consectetuer adipiscing elit...* is transformed into *23, 1, 80, 86, 47, 50001, 21, 59, 83, 93, 14, 50003, 4, 7*

Also, after each word flags indicating lower case, upper case, mixed case or initial capitalization are added.

### To do

There will be an implementation supporting the transformation from **text into bag-of-word vectors**.

Install
-------

```
pip install l3wtransformer
```

Usage
-----

```
from l3wtransformer import L3wTransformer

l3wt = L3wTransformer()

l3wt.fit_on_texts(['First example.', 'And one more!'])
l3wt.texts_to_sequences(['One example', '2nd exa.'])

# [[5, 18, 17, 50001, 2, 10, 24, 6, 15, 20, 50003], [16, 50003, 2, 10, 50003]]
```


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

l3wtransformer-0.3.0-py2.py3-none-any.whl (4.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file l3wtransformer-0.3.0-py2.py3-none-any.whl.

File metadata

  • Download URL: l3wtransformer-0.3.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.5

File hashes

Hashes for l3wtransformer-0.3.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ee11f6924565edee0a1fa2f068ed529294ebd5111ab6e5bb2858ef651b5d6cc4
MD5 6d5449e2a66e8c11b211851492ad199d
BLAKE2b-256 78c335f16fcc0ca32c538b4d3a255ae990272d14eb4010c06035fe527dacf66f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page