Skip to main content

National characters transcription module.

Project description

This module translates national characters into similar sounding latin characters (transliteration). At the moment, Czech, Greek, Latvian, Polish, Turkish, Russian, Ukrainian alphabets are supported (it covers 99% of needs).

Simple usage

It’s very easy to use

>>> # coding: utf-8
>>> import trans
>>> u'Hello World!'.encode('trans')
u'Hello World!'
>>> u'Привет, Мир!'.encode('trans')
u'Privet, Mir!'

Work only with unicode strings

>>> 'Hello World!'.encode('trans')
Traceback (most recent call last):
    ...
TypeError: trans codec support only unicode string, <type 'str'> given.

This is readability

>>> s = u'''\
...    -- Раскудрить твою через коромысло в бога душу мать
...             триста тысяч раз едрену вошь тебе в крыло
...             и кактус в глотку! -- взревел разъяренный Никодим.
...    -- Аминь, -- робко добавил из склепа папа Пий.
...                 (c) Г. Л. Олди, "Сказки дедушки вампира".'''
>>>
>>> print s.encode('trans')
   -- Raskudrit tvoyu cherez koromyslo v boga dushu mat
            trista tysyach raz edrenu vosh tebe v krylo
            i kaktus v glotku! -- vzrevel razyarennyy Nikodim.
   -- Amin, -- robko dobavil iz sklepa papa Piy.
                (c) G. L. Oldi, "Skazki dedushki vampira".

Table “slug

Use the table “slug”, leaving only the Latin characters, digits and underscores:

>>> print u'1 2 3 4 5 \n6 7 8 9 0'.encode('trans')
1 2 3 4 5
6 7 8 9 0
>>> print u'1 2 3 4 5 \n6 7 8 9 0'.encode('trans/slug')
1_2_3_4_5__6_7_8_9_0
>>> s.encode('trans/slug')[-42:-1]
u'_c__G__L__Oldi___Skazki_dedushki_vampira_'

Table “id

Table id is deprecated and renamed to slug. Old name also available, but not recommended.

Define user tables

Simple variant

>>> u'1 2 3 4 5 6 7 8 9 0'.encode('trans/my')
Traceback (most recent call last):
    ...
ValueError: Table "my" not found in tables!
>>> trans.tables['my'] = {u'1': u'A', u'2': u'B'};
>>> u'1 2 3 4 5 6 7 8 9 0'.encode('trans/my')
u'A_B________________'
>>>

A little harder

Table can consist of two parts - the map of diphthongs and the map of characters. Diphthongs are processed first by simple replacement in the substring. Then each character of the received string is replaced according to the map of characters. If character is absent in the map of characters, key None are checked. If key None is not present, the default character u’_’ is used.

>>> diphthongs = {u'11': u'AA', u'22': u'BB'}
>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-',
...               u'A': u'A', u'B': u'B'}  # See below...
>>> trans.tables['test'] = (diphthongs, characters)
>>> u'11abc22cbaCC'.encode('trans/test')
u'AAzyxBBxyz--'

The characters are created by processing of diphthongs also processed by the map of the symbols:

>>> diphthongs = {u'11': u'AA', u'22': u'BB'}
>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-'}
>>> trans.tables['test'] = (diphthongs, characters)
>>> u'11abc22cbaCC'.encode('trans/test')
u'--zyx--xyz--'

Without the diphthongs

These two tables are equivalent:

>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-'}
>>> trans.tables['t1'] = characters
>>> trans.tables['t2'] = ({}, characters)
>>> u'11abc22cbaCC'.encode('trans/t1') == u'11abc22cbaCC'.encode('trans/t2')
True

ChangeLog

1.3 - 2010-05-18

  • Table “id” renamed to “slug”. Old name also available.

  • Some speed optimizations (thx to AndyLegkiy <andy.legkiy at gmail.com>).

1.2 - 2010-01-10

  • First public release.

  • Translate documentation to English.

Finally

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trans-1.3.tar.bz2 (9.3 kB view details)

Uploaded Source

File details

Details for the file trans-1.3.tar.bz2.

File metadata

  • Download URL: trans-1.3.tar.bz2
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for trans-1.3.tar.bz2
Algorithm Hash digest
SHA256 b8eb3479248d4349713b592292b4be5ed4f15ef27a2dabbb3d7b32687fc3b69a
MD5 e9216b17487222de89572d7664e5408e
BLAKE2b-256 68eaf2683bec9657091a4bf368e10e8c680b9530deb3fe9e0dfc56cedf81ef63

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page