Fast C Unicode to 8-bit charset transliteration codec
Project description
This package contains codecs for transliterating ISO 10646 texts into best-effort representations using smaller coded character sets (ASCII, ISO 8859, etc.). It is a C reimplementation of the [translitcodec][1] module by Jason Kirtland, as well as a drop-in replacement.
The translation tables used by the codecs are from the [transtab][2] collection by [Prof. Markus Kuhn][3], with some extensions by [Fazal Majid][4].
Three types of transliterating codecs are provided:
- “long”, using as many characters as needed to make a natural
replacement. For example, u00e4 LATIN SMALL LETTER A WITH DIAERESIS ä will be replaced with ae.
“short”, using the minimum number of characters to make a replacement. For example, u00e4 LATIN SMALL LETTER A WITH DIAERESIS ä will be replaced with a.
“one”, only performing single character replacements. Characters that can not be transliterated with a single character are passed through unchanged. For example, u2639 WHITE FROWNING FACE ☹ will be passed through unchanged.
Using the codecs is simple:
>>> import ctranslitcodec >>> import codecs >>> codecs.encode(u'fácil € ☺', 'ctranslit/long') u'facil EUR :-)' >>> codecs.encode(u'fácil € ☺', 'ctranslit/short') u'facil E :-)'
[1]: https://pypi.org/project/translitcodec/ [2]: https://www.cl.cam.ac.uk/~mgk25/unicode.html#libs [3]: https://www.cl.cam.ac.uk/~mgk25/ [4]: https://majid.info/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.