corrects English spelling mistakes and normalize. (e.g., "cooooooooooooooollllllllllllll" to "cool")
pytypo corrects English spelling mistakes. That feature is based on TYPO CORPUS (http://luululu.com/tweet/) and Wikipedia (https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines)
And this module normalizes also lengthened English expression having repeating letters. (e.g., this module converts “cooooooooooooooollllllllllllll” to “cool”)
That feature is based on the following paper: Samuel Brody and Nicholas Diakopoulos. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! using word lengthening to detect sentiment in microblogs. In EMNLP2011, pp. 562-570, 2011. http://aclweb.org/anthology//D/D11/D11-1052.pdf
Contributions are welcome!
$ pip install pytypo
>>> import pytypo
>>> pytypo.correct_sentence('you are coooolll!!!') you are cool!
>>> pytypo.correct('okayyyyy') okay
Shorten repeated substring until threshould without dictionary
>>> pytypo.cut_repeat('mamisaaaaaan', 1) mamisan >>> pytypo.cut_repeat('okayyyyy', 2) okayy
- cut_repeat(str, threshould)
- Note that this method don’t use a lengthened expression normalize table (e.g., cooll -> cool). If you want to normalize such expression, use correct() or correct_sentence() method.
- This module is licensed under MIT License.
Add many cases from Wikipedia
Add many cases