translitcodec

Unicode to 8-bit charset transliteration codec

These details have not been verified by PyPI

Project links

Homepage

Project description

best-effort representations using smaller coded character sets (ASCII, ISO 8859, etc.). The translation tables used by the codecs are from the transtab collection by Markus Kuhn.

Three types of transliterating codecs are provided:

“long”, using as many characters as needed to make a natural

replacement. For example, u00e4 LATIN SMALL LETTER A WITH DIAERESIS ä will be replaced with ae.

“short”, using the minimum number of characters to make a replacement. For example, u00e4 LATIN SMALL LETTER A WITH DIAERESIS ä will be replaced with a.

“one”, only performing single character replacements. Characters that can not be transliterated with a single character are passed through unchanged. For example, u2639 WHITE FROWNING FACE ☹ will be passed through unchanged.

Using the codecs is simple:

>>> import translitcodec
>>> import codecs
>>> codecs.encode('fácil € ☺', 'translit/long')
'facil EUR :-)'
>>> codecs.encode('fácil € ☺', 'translit/short')
'facil E :-)'

The codecs return Unicode by default. To receive a bytestring back, either chain the output of encode() to another codec, or append the name of the desired byte encoding to the codec name:

>>> codecs.encode('fácil € ☺', 'translit/one').encode('ascii', 'replace')
'facil E ?'
>>> 'fácil € ☺'.encode('translit/one/ascii', 'replace')
'facil E ?'

The package also supplies a ‘transliterate’ codec, an alias for ‘translit/long’.

Another way to use the library is to use an error handle. Error handles are available:

‘strict/translit/long’, ‘strict/translit/short’, ‘strict/translit/one’ - similar to ‘strict’

‘ignore/translit/long’, ‘ignore/translit/short’, ‘ignore/translit/one’ - similar to ‘ignore’

‘replace/translit/long’, ‘replace/translit/short’, ‘replace/translit/one’ - similar to ‘replace’

These error handles above, work similarly to Python’s built-in ones. The difference is that transliteration is attempted first.

>>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'replace/translit/long').decode('ISO-8859-2')
'Zażółć gęślą jaźń EUR :-)?!@#'
>>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'replace/translit/short').decode('ISO-8859-2')
'Zażółć gęślą jaźń E :-)?!@#'
>>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'replace/translit/one').decode('ISO-8859-2')
'Zażółć gęślą jaźń E ??!@#'
>>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'ignore/translit/long').decode('ISO-8859-2')
'Zażółć gęślą jaźń EUR :-)!@#'
>>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'ignore/translit/short').decode('ISO-8859-2')
'Zażółć gęślą jaźń E :-)!@#'
>>> codecs.encode('Zażółć gęślą jaźń € ☺另!@#', 'ISO-8859-2', 'ignore/translit/one').decode('ISO-8859-2')
'Zażółć gęślą jaźń E !@#'

translitcodec Changes

0.7.0

Released on May 8, 2021

Added support for error handles
Fixed conversion of the German eszett char

0.6.0

Released on December 13, 2020

Add support for Python 3.9

0.5.2

Released on January 19, 2020

Install package with setuptools

0.5.1

Released on January 19, 2020

Add python_requires to prevent installation with Python 2 packages

0.5

Released on January 18, 2020

Complete coverage of the Vietnamese alphabet
Removed Python 2 support

0.4

Released on May 11, 2015

Added Python 3 compatibility

0.3

Released on February 14, 2011

Fixes to the transtab table rebuilding tool.
Added translitcodec.__version__

0.2

Released on January 27, 2011

Resolves issue of “TypeError: character mapping must return integer, None or unicode” when a blank value (eg: N{ZERO WIDTH SPACE} u200B) was encoded. Unicode blanks are now returned.
Characters in the ASCII range are no longer included in the translation tables.

0.1

Released on December 28, 2008

Initial packaged release.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.7.0

May 8, 2021

0.6.0

Dec 13, 2020

0.5.2

Jan 19, 2020

0.4.0

May 11, 2015

0.3

Feb 13, 2012

0.2

Jan 28, 2011

0.1

Dec 28, 2008

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

translitcodec-0.7.0.tar.gz (52.4 kB view details)

Uploaded May 8, 2021 Source

File details

Details for the file translitcodec-0.7.0.tar.gz.

File metadata

Download URL: translitcodec-0.7.0.tar.gz
Upload date: May 8, 2021
Size: 52.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.2

File hashes

Hashes for translitcodec-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`3be7975c630ec0f1dd5b3712160c991a9776132985aed2588cba083ba00fa3c8`
MD5	`a9699192bcc25bc5248e375703e28309`
BLAKE2b-256	`f128c0dbcc80648da7639b1dd3df95e8cc84c40407093d84a28234950dc740c4`

See more details on using hashes here.

translitcodec 0.7.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

translitcodec Changes

0.7.0

0.6.0

0.5.2

0.5.1

0.5

0.4

0.3

0.2

0.1

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes