Micro-library to normalize text strings
Project description
normality
Normality is a Python micro-package that contains a small set of text normalization functions for easier re-use. These functions accept a snippet of unicode or utf-8 encoded text and remove various classes of characters, such as diacritics, punctuation etc. This is useful as a preparation to further text analysis.
WARNING: This library works much better when used in combination
with pyicu
, a Python binding for the International Components for
Unicode C library. ICU provides much better text transliteration than
the default text-unidecode
.
Example
# coding: utf-8
from normality import normalize, slugify, collapse_spaces
text = normalize('Nie wieder "Grüne Süppchen" kochen!')
assert text == 'nie wieder grune suppchen kochen'
slug = slugify('My first blog post!')
assert slug == 'my-first-blog-post'
text = 'this \n\n\r\nhas\tlots of \nodd spacing.'
assert collapse_spaces(text) == 'this has lots of odd spacing.'
License
normality
is open source, licensed under a standard MIT license
(included in this repository as LICENSE
).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for normality-2.3.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c717633c8f147d9b8331229a879f6c598c0d135e041562b9d9ca3d7b26f85d9 |
|
MD5 | 77e73f1ca16fb9f509695fef3c2b6c0c |
|
BLAKE2b-256 | c4af189b33e4b554e4f84f4d0c8e0ade4c9eaa5b18c298db3f2420289245e2f5 |