Skip to main content

Fast C Unicode to 8-bit charset transliteration codec

Project description

This package contains codecs for transliterating ISO 10646 texts into best-effort representations using smaller coded character sets (ASCII, ISO 8859, etc.). It is a C reimplementation of the [translitcodec][1] module by Jason Kirtland, as well as a drop-in replacement.

The translation tables used by the codecs are from the [transtab][2] collection by [Prof. Markus Kuhn][3], with some extensions by [Fazal Majid][4].

Three types of transliterating codecs are provided:

“long”, using as many characters as needed to make a natural

replacement. For example, u00e4 LATIN SMALL LETTER A WITH DIAERESIS ä will be replaced with ae.

“short”, using the minimum number of characters to make a replacement. For example, u00e4 LATIN SMALL LETTER A WITH DIAERESIS ä will be replaced with a.

“one”, only performing single character replacements. Characters that can not be transliterated with a single character are passed through unchanged. For example, u2639 WHITE FROWNING FACE will be passed through unchanged.

Using the codecs is simple:

>>> import ctranslitcodec
>>> import codecs
>>> codecs.encode(u'fácil € ☺', 'ctranslit/long')
u'facil EUR :-)'
>>> codecs.encode(u'fácil € ☺', 'ctranslit/short')
u'facil E :-)'

[1]: https://pypi.org/project/translitcodec/ [2]: https://www.cl.cam.ac.uk/~mgk25/unicode.html#libs [3]: https://www.cl.cam.ac.uk/~mgk25/ [4]: https://majid.info/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctranslitcodec-0.2.1.tar.gz (54.0 kB view details)

Uploaded Source

File details

Details for the file ctranslitcodec-0.2.1.tar.gz.

File metadata

  • Download URL: ctranslitcodec-0.2.1.tar.gz
  • Upload date:
  • Size: 54.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for ctranslitcodec-0.2.1.tar.gz
Algorithm Hash digest
SHA256 fc494143e20e06f1aa46903519eed967ca20efdc1301a367ebc085bec0d1c325
MD5 99a807d2a02b765059fe173e15c0f36f
BLAKE2b-256 09b26cdb296b1cc5eda89722f93c7e77633aaf3b38d854dda3677f15cd53baed

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page