Skip to main content

National characters transcription module.

Project description

This module translates national characters into similar sounding latin characters (transliteration). At the moment, Czech, Greek, Latvian, Polish, Turkish, Russian, Ukrainian, Kazakh and Farsi alphabets are supported (it covers 99% of needs).

Simple usage

It’s very easy to use

Python 3:

>>> from trans import trans
>>> trans('Привет, Мир!')

Python 2:

>>> import trans
>>> u'Привет, Мир!'.encode('trans')
u'Privet, Mir!'
>>> trans.trans(u'Привет, Мир!')
u'Privet, Mir!'

Work only with unicode strings

>>> 'Hello World!'.encode('trans')
Traceback (most recent call last):
    ...
TypeError: trans codec support only unicode string, <type 'str'> given.

This is readability

>>> s = u'''\
...    -- Раскудрить твою через коромысло в бога душу мать
...             триста тысяч раз едрену вошь тебе в крыло
...             и кактус в глотку! -- взревел разъяренный Никодим.
...    -- Аминь, -- робко добавил из склепа папа Пий.
...                 (c) Г. Л. Олди, "Сказки дедушки вампира".'''
>>>
>>> print s.encode('trans')
   -- Raskudrit tvoyu cherez koromyslo v boga dushu mat
            trista tysyach raz edrenu vosh tebe v krylo
            i kaktus v glotku! -- vzrevel razyarennyy Nikodim.
   -- Amin, -- robko dobavil iz sklepa papa Piy.
                (c) G. L. Oldi, "Skazki dedushki vampira".

Table “slug

Use the table “slug”, leaving only the Latin characters, digits and underscores:

>>> print u'1 2 3 4 5 \n6 7 8 9 0'.encode('trans')
1 2 3 4 5
6 7 8 9 0
>>> print u'1 2 3 4 5 \n6 7 8 9 0'.encode('trans/slug')
1_2_3_4_5__6_7_8_9_0
>>> s.encode('trans/slug')[-42:-1]
u'_c__G__L__Oldi___Skazki_dedushki_vampira_'

Table “id

Table id is deprecated and renamed to slug. Old name also available, but not recommended.

Define user tables

Simple variant

>>> u'1 2 3 4 5 6 7 8 9 0'.encode('trans/my')
Traceback (most recent call last):
    ...
ValueError: Table "my" not found in tables!
>>> trans.tables['my'] = {u'1': u'A', u'2': u'B'};
>>> u'1 2 3 4 5 6 7 8 9 0'.encode('trans/my')
u'A_B________________'
>>>

A little harder

Table can consist of two parts - the map of diphthongs and the map of characters. Diphthongs are processed first by simple replacement in the substring. Then each character of the received string is replaced according to the map of characters. If character is absent in the map of characters, key None are checked. If key None is not present, the default character u’_’ is used.

>>> diphthongs = {u'11': u'AA', u'22': u'BB'}
>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-',
...               u'A': u'A', u'B': u'B'}  # See below...
>>> trans.tables['test'] = (diphthongs, characters)
>>> u'11abc22cbaCC'.encode('trans/test')
u'AAzyxBBxyz--'

The characters are created by processing of diphthongs also processed by the map of the symbols:

>>> diphthongs = {u'11': u'AA', u'22': u'BB'}
>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-'}
>>> trans.tables['test'] = (diphthongs, characters)
>>> u'11abc22cbaCC'.encode('trans/test')
u'--zyx--xyz--'

Without the diphthongs

These two tables are equivalent:

>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-'}
>>> trans.tables['t1'] = characters
>>> trans.tables['t2'] = ({}, characters)
>>> u'11abc22cbaCC'.encode('trans/t1') == u'11abc22cbaCC'.encode('trans/t2')
True

ChangeLog

2.1 2016-09-19

  • Add Farsi alphabet (thx rodgar-nvkz)

  • Use pytest

  • Some code style refactoring

2.0 2013-04-01

  • Python 3 support

  • class Trans for create different tables spaces

1.5 2012-09-12

  • Add support of kazakh alphabet.

1.4 2011-11-29

  • Change license to BSD.

1.3 2010-05-18

  • Table “id” renamed to “slug”. Old name also available.

  • Some speed optimizations (thx to AndyLegkiy <andy.legkiy at gmail.com>).

1.2 2010-01-10

  • First public release.

  • Translate documentation to English.

Finally

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trans_rebuild-2.1.0b0.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trans_rebuild-2.1.0b0-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file trans_rebuild-2.1.0b0.tar.gz.

File metadata

  • Download URL: trans_rebuild-2.1.0b0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for trans_rebuild-2.1.0b0.tar.gz
Algorithm Hash digest
SHA256 a41efcf1cdf46d5125a84156fb13945f69a488bfe88dc0b53fd9f94d55eabc93
MD5 d3f939da97afd546d6adc04e519954a2
BLAKE2b-256 35f032331775010ff2f843c136946df7ea810996c8bafc631cf56a226a61deb4

See more details on using hashes here.

File details

Details for the file trans_rebuild-2.1.0b0-py3-none-any.whl.

File metadata

File hashes

Hashes for trans_rebuild-2.1.0b0-py3-none-any.whl
Algorithm Hash digest
SHA256 2be50be89f803f473f8f2c4617d566b5eb240f9ace5acf41208eb57dfe620127
MD5 02659552458f229faa2bf7058972b2bb
BLAKE2b-256 664b4b48a1a3c9598c942482e24cd610e1452c8f55a277896574186931ade5a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page