Skip to main content

Unicode to 8-bit charset transliteration codec

Project description

-*- coding: utf-8 -*-

Unicode to 8-bit charset transliteration codec.

This package contains codecs for transliterating ISO 10646 texts into
best-effort representations using smaller coded character sets (ASCII,
ISO 8859, etc.). The translation tables used by the codecs are from
the ``transtab`` collection by Markus Kuhn.

Three types of transliterating codecs are provided:

"long", using as many characters as needed to make a natural
replacement. For example, \u00e4 LATIN SMALL LETTER A WITH
DIAERESIS ``ä`` will be replaced with ``ae``.

"short", using the minimum number of characters to make a
replacement. For example, \u00e4 LATIN SMALL LETTER A WITH
DIAERESIS ``ä`` will be replaced with ``a``.

"one", only performing single character replacements. Characters
that can not be transliterated with a single character are passed
through unchanged. For example, \u2639 WHITE FROWNING FACE ``☹``
will be passed through unchanged.

Using the codecs is simple::

>>> import translitcodec
>>> u'fácil € ☺'.encode('translit/long')
u'facil EUR :-)'
>>> u'fácil € ☺'.encode('translit/short')
u'facil E :-)'

The codecs return Unicode by default. To receive a bytestring back,
either chain the output of encode() to another codec, or append the
name of the desired byte encoding to the codec name::

>>> u'fácil € ☺'.encode('translit/one').encode('ascii', 'replace')
'facil E ?'
>>> u'fácil € ☺'.encode('translit/one/ascii', 'replace')
'facil E ?'

The package also supplies a 'transliterate' codec, an alias for
'translit/long'.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

translitcodec-0.1.zip (51.1 kB view details)

Uploaded Source

File details

Details for the file translitcodec-0.1.zip.

File metadata

  • Download URL: translitcodec-0.1.zip
  • Upload date:
  • Size: 51.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for translitcodec-0.1.zip
Algorithm Hash digest
SHA256 160f7257dc102711ba20951844c59c77ab9f951488183b66483f5fe64a42dcb7
MD5 d1cfe258a77371d2fd4e5db9d16f973e
BLAKE2b-256 71f2021ce28454ed02543c01f5c58bb450d3882816fdf4fa0e33cd1ae6d8b10d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page