Skip to main content

Unicode to 8-bit charset transliteration codec

Project description

This package contains codecs for transliterating ISO 10646 texts into best-effort representations using smaller coded character sets (ASCII, ISO 8859, etc.). The translation tables used by the codecs are from the transtab collection by Markus Kuhn.

Three types of transliterating codecs are provided:

“long”, using as many characters as needed to make a natural

replacement. For example, u00e4 LATIN SMALL LETTER A WITH DIAERESIS ä will be replaced with ae.

“short”, using the minimum number of characters to make a replacement. For example, u00e4 LATIN SMALL LETTER A WITH DIAERESIS ä will be replaced with a.

“one”, only performing single character replacements. Characters that can not be transliterated with a single character are passed through unchanged. For example, u2639 WHITE FROWNING FACE will be passed through unchanged.

Using the codecs is simple:

>>> import translitcodec
>>> u'fácil € ☺'.encode('translit/long')
u'facil EUR :-)'
>>> u'fácil € ☺'.encode('translit/short')
u'facil E :-)'

The codecs return Unicode by default. To receive a bytestring back, either chain the output of encode() to another codec, or append the name of the desired byte encoding to the codec name:

>>> u'fácil € ☺'.encode('translit/one').encode('ascii', 'replace')
'facil E ?'
>>> u'fácil € ☺'.encode('translit/one/ascii', 'replace')
'facil E ?'

The package also supplies a ‘transliterate’ codec, an alias for ‘translit/long’.

translitcodec Changes

0.3

Released on February 14, 2011

  • Fixes to the transtab table rebuilding tool.

  • Added translitcodec.__version__

0.2

Released on January 27, 2011

  • Resolves issue of “TypeError: character mapping must return integer, None or unicode” when a blank value (eg: N{ZERO WIDTH SPACE} u200B) was encoded. Unicode blanks are now returned.

  • Characters in the ASCII range are no longer included in the translation tables.

0.1

Released on December 28, 2008

  • Initial packaged release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

translitcodec-0.3.tar.gz (48.4 kB view details)

Uploaded Source

File details

Details for the file translitcodec-0.3.tar.gz.

File metadata

  • Download URL: translitcodec-0.3.tar.gz
  • Upload date:
  • Size: 48.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for translitcodec-0.3.tar.gz
Algorithm Hash digest
SHA256 7a088bc99a0654d5061416e61650c194472ea8d7e9f7395f2772dfa5a7f28499
MD5 2227e9f79fd3c570d49f8c5cc5db5812
BLAKE2b-256 9c9aff88e7aad6062b9988911b383143f332687fef45f033fea35bff9ced5360

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page