English lengthened expression normalizer (e.g., coooolllll!!! -> cool!)
Udon is a text normalizer for lengthened English expression having repeating letters.
(e.g., Udon converts “cooooooooooooooollllllllllllll” to “cool”)
This module is based on the following paper:
Samuel Brody and Nicholas Diakopoulos. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! using word lengthening to detect sentiment in microblogs. In EMNLP2011, pp. 562-570, 2011.
$ pip install udon
>>> import udon
>>> udon.normalize_sentence('you are coooolll!!!') you are cool!
>>> udon.normalize_word('okayyyyy') okay
>>> udon.cut_repeat('mamisaaaaaan', 1) mamisan >>> udon.cut_repeat('okayyyyy', 2) okayy
Contributions are welcome!
Available on Python 3.x