National characters transcription module.
Project description
This module translates national characters into similar sounding latin characters (transliteration). At the moment, Czech, Greek, Latvian, Polish, Turkish, Russian, Ukrainian alphabets are supported (it covers 99% of needs).
Simple usage
It’s very easy to use
>>> # coding: utf-8 >>> import trans >>> u'Hello World!'.encode('trans') u'Hello World!' >>> u'Привет, Мир!'.encode('trans') u'Privet, Mir!'
Work only with unicode strings
>>> 'Hello World!'.encode('trans') Traceback (most recent call last): ... TypeError: trans codec support only unicode string, <type 'str'> given.
This is readability
>>> s = u'''\ ... -- Раскудрить твою через коромысло в бога душу мать ... триста тысяч раз едрену вошь тебе в крыло ... и кактус в глотку! -- взревел разъяренный Никодим. ... -- Аминь, -- робко добавил из склепа папа Пий. ... (c) Г. Л. Олди, "Сказки дедушки вампира".''' >>> >>> print s.encode('trans') -- Raskudrit tvoyu cherez koromyslo v boga dushu mat trista tysyach raz edrenu vosh tebe v krylo i kaktus v glotku! -- vzrevel razyarennyy Nikodim. -- Amin, -- robko dobavil iz sklepa papa Piy. (c) G. L. Oldi, "Skazki dedushki vampira".
Table “id”
Use the table “id”, leaving only the Latin characters, digits and underscores:
>>> print u'1 2 3 4 5 \n6 7 8 9 0'.encode('trans') 1 2 3 4 5 6 7 8 9 0 >>> print u'1 2 3 4 5 \n6 7 8 9 0'.encode('trans/id') 1_2_3_4_5__6_7_8_9_0 >>> s.encode('trans/id')[-42:-1] u'_c__G__L__Oldi___Skazki_dedushki_vampira_'
Define user tables
Simple variant
>>> u'1 2 3 4 5 6 7 8 9 0'.encode('trans/my') Traceback (most recent call last): ... ValueError: Table "my" not found in tables! >>> trans.tables['my'] = {u'1': u'A', u'2': u'B'}; >>> u'1 2 3 4 5 6 7 8 9 0'.encode('trans/my') u'A_B________________' >>>
A little harder
Table can consist of two parts - the map of diphthongs and the map of characters. Diphthongs are processed first by simple replacement in the substring. Then each character of the received string is replaced according to the map of characters. If character is absent in the map of characters, key None are checked. If key None is not present, the default character u’_’ is used.
>>> diphthongs = {u'11': u'AA', u'22': u'BB'} >>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-', ... u'A': u'A', u'B': u'B'} # See below... >>> trans.tables['test'] = (diphthongs, characters) >>> u'11abc22cbaCC'.encode('trans/test') u'AAzyxBBxyz--'
The characters are created by processing of diphthongs also processed by the map of the symbols:
>>> diphthongs = {u'11': u'AA', u'22': u'BB'} >>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-'} >>> trans.tables['test'] = (diphthongs, characters) >>> u'11abc22cbaCC'.encode('trans/test') u'--zyx--xyz--'
Without the diphthongs
These two tables are equivalent:
>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-'} >>> trans.tables['t1'] = characters >>> trans.tables['t2'] = ({}, characters) >>> u'11abc22cbaCC'.encode('trans/t1') == u'11abc22cbaCC'.encode('trans/t2') True
Finally
- Special thanks to Yuri Yurevich aka j2a for the kick in the right direction.
I ask forgiveness for my bad English. I promise to be corrected.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file trans-1.2.tar.bz2
.
File metadata
- Download URL: trans-1.2.tar.bz2
- Upload date:
- Size: 9.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6e17d4f217fbd626b9ab4e1d938755a9fabf4402fe3ee6a7cc63ec9ff3137b4 |
|
MD5 | 9541c56b3ffe27adff3c93072caf9d81 |
|
BLAKE2b-256 | 27138189dc06fa2e8d6f8a535ad73c9b3baa4a6c46dc4a4e64ea0d98be62b9ad |