Micro-library to normalize text strings
Normality is a Python micro-package that contains a small set of text normalization functions for easier re-use. These functions accept a snippet of unicode or utf-8 encoded text and remove various classes of characters, such as diacritics, punctuation etc. This is useful as a preparation to further text analysis.
WARNING: This library works much better when used in combination
pyicu, a Python binding for the International Components for
Unicode C library. ICU provides much better text transliteration than
# coding: utf-8 from normality import normalize, slugify, collapse_spaces text = normalize('Nie wieder "Grüne Süppchen" kochen!') assert text == 'nie wieder grune suppchen kochen' slug = slugify('My first blog post!') assert slug == 'my-first-blog-post' text = 'this \n\n\r\nhas\tlots of \nodd spacing.' assert collapse_spaces(text) == 'this has lots of odd spacing.'
normality is open source, licensed under a standard MIT license
(included in this repository as
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size normality-2.2.3-py2.py3-none-any.whl (12.4 kB)||File type Wheel||Python version py2.py3||Upload date||Hashes View|
|Filename, size normality-2.2.3.tar.gz (10.3 kB)||File type Source||Python version None||Upload date||Hashes View|
Hashes for normality-2.2.3-py2.py3-none-any.whl