Skip to main content

Bi-directional Cyrillic transliteration. Transliterate Cyrillic script to Latin script and vice versa. Supports transliteration for Belarusian, Bulgarian, Greek, Montenegrin, Macedonian, Mongolian, Russian, Serbian, Tajik, and Ukrainian.

Project description

DOI

What is CyrTranslit?

A Python package for bi-directional transliteration of Cyrillic script to Latin script and vice versa.

By default, transliterates for the Serbian language. A language flag can be set in order to transliterate to and from Belarusian, Bulgarian, Greek, Montenegrin, Macedonian, Mongolian, Russian, Serbian, Tajik, and Ukrainian.

Note: Greek is also supported. While Greek uses its own alphabet and is not Cyrillic, it has been included due to user demand and shared transliteration needs.

What is transliteration?

Transliteration is the conversion of a text from one script to another. For instance, a Latin alphabet transliteration of the Serbian phrase "Мој ховеркрафт је пун јегуља" is "Moj hoverkraft je pun jegulja".

Citation

A citation would be much appreciated if you use CyrTranslit in a research publication:

Georges Labrèche. (2025). CyrTranslit (1.2.0). Zenodo. https://doi.org/10.5281/zenodo.17663256

BibTex entry:

@software{georges_labreche_nov2025,
  author       = {Georges Labrèche},
  title        = {CyrTranslit},
  month        = nov,
  year         = 2025,
  note         = {{A Python package for bi-directional 
                   transliteration of Cyrillic script to Latin script
                   and vice versa. Supports transliteration for Belarusian, 
                   Bulgarian, Greek, Montenegrin, Macedonian, Mongolian,
                   Russian, Serbian, Tajik, and Ukrainian.}},
  publisher    = {Zenodo},
  version      = {1.2.0},
  doi          = {10.5281/zenodo.17663256},
  url          = {https://doi.org/10.5281/zenodo.17663256}
}

Advancing research

CyrTranslit is actively used as a reliable tool to advance research! Here's an incomplete list of publications for research projects that have relied on CyrTranslit:

Text Normalization, Unicode Perturbations & Robustness

Low-Resource NLP & Machine Translation

Serbian Language NLP (Topic Modeling, Sentiment, Lexicons, QA, Abuse Detection)

NLP Applications for Society, Government, and Political Analysis

Engineering, Software Systems, and Backend Development

Proceedings, Collections, and Meta-Documents

Addresses, Geocoding, and NLP

How do I install this?

CyrTranslit is hosted in the Python Package Index (PyPI) so it can be installed using pip:

python3 -m pip install cyrtranslit         # latest version
python3 -m pip install cyrtranslit==1.2.0  # specific version
python3 -m pip install cyrtranslit>=1.2.0  # minimum version

What languages are supported?

CyrTranslit currently supports bi-directional transliteration of Belarusian, Bulgarian, Greek, Montenegrin, Macedonian, Mongolian, Russian, Serbian, Tajik, and Ukrainian.

Language codes are based on ISO 639-1 standards. For Serbian, both sr (ISO 639-1 language code) and rs (ISO 3166-1 country code) are accepted:

>>> import cyrtranslit
>>> cyrtranslit.supported()
['bg', 'by', 'el', 'me', 'mk', 'mn', 'rs', 'ru', 'sr', 'tj', 'ua']

How do I use this?

CyrTranslit can be used both programatically and via command line interface.

Programmatically

Belarusian

>>> import cyrtranslit
>>> cyrtranslit.to_latin("Прывітанне, свет!", "by")
"Pryvitanne, svet!"
>>> cyrtranslit.to_cyrillic("Pryvitanne, svet!", "by")
"Прывітанне, свет!"

Bulgarian

>>> import cyrtranslit
>>> cyrtranslit.to_latin("Съединението прави силата!", "bg")
"Săedinenieto pravi silata!"
>>> cyrtranslit.to_cyrillic("Săedinenieto pravi silata!", "bg")
"Съединението прави силата!"

Greek

>>> import cyrtranslit
>>> cyrtranslit.to_latin("Το χόβερκραφτ μου είναι γεμάτο χέλια", "el")
"To choverkraft moy einai gemato chelia"
>>> cyrtranslit.to_cyrillic("To choverkraft moy einai gemato chelia", "el")
"Το χόβερκραφτ μου είναι γεμάτο χέλια"

Montenegrin

>>> import cyrtranslit
>>> cyrtranslit.to_latin("Република", "me")
"Republika"
>>> cyrtranslit.to_cyrillic("Republika", "me")
"Република"

Macedonian

>>> import cyrtranslit
>>> cyrtranslit.to_latin("Моето летачко возило е полно со јагули", "mk")
"Moeto letačko vozilo e polno so jaguli"
>>> cyrtranslit.to_cyrillic("Moeto letačko vozilo e polno so jaguli", "mk")
"Моето летачко возило е полно со јагули"

Mongolian

>>> import cyrtranslit
>>> cyrtranslit.to_latin("Амрагаа Сүнжидмаагаа гэсээр ирлээ дээ хө-хө-хө", "mn")
"Amragaa Sünjidmaagaa geseer irlee dee khö-khö-khö"
>>> cyrtranslit.to_cyrillic("Amragaa Sünjidmaagaa geseer irlee dee khö-khö-khö", "mn")
"Амрагаа Сүнжидмаагаа гэсээр ирлээ дээ хө-хө-хө"

Russian

>>> import cyrtranslit
>>> cyrtranslit.to_latin("Моё судно на воздушной подушке полно угрей", "ru")
"Moyo sudno na vozdushnoj podushke polno ugrej"
>>> cyrtranslit.to_cyrillic("Moyo sudno na vozdushnoj podushke polno ugrej", "ru")
"Моё судно на воздушной подушке полно угрей"

Serbian

>>> import cyrtranslit
>>> cyrtranslit.to_latin("Мој ховеркрафт је пун јегуља")
"Moj hoverkraft je pun jegulja"
>>> cyrtranslit.to_cyrillic("Moj hoverkraft je pun jegulja")
"Мој ховеркрафт је пун јегуља"

Tajik

>>> import cyrtranslit
>>> cyrtranslit.to_latin("Ман мактуб навишта истодам", "tj")
"Man maktub navišta istodam"
>>> cyrtranslit.to_cyrillic("Man maktub navišta istodam", "tj")
"Ман мактуб навишта истодам"

Ukrainian

>>> import cyrtranslit
>>> cyrtranslit.to_latin("Під лежачий камінь вода не тече", "ua")
"Pid ležačyj kamin' voda ne teče"
>>> cyrtranslit.to_cyrillic("Pid ležačyj kamin' voda ne teče", "ua")
"Під лежачий камінь вода не тече"

Accented Characters (Macedonian & Bulgarian)

CyrTranslit supports Cyrillic characters with grave accents used in Macedonian and Bulgarian for homograph disambiguation and stress marking. By default, accents are stripped during transliteration for cleaner output. Use the preserve_accents parameter to preserve them.

Supported Accented Characters

Macedonian:

  • Ѐ/ѐ (U+0400/U+0450) - Cyrillic IE with grave

    • Purpose: Distinguishes homographs (e.g., нѐ "us" vs не "no", сѐ "everything" vs се "reflexive pronoun")
    • Standard: ISO 9:1968/1995, adopted by Macedonian Academy of Arts and Sciences (1970)
  • Ѝ/ѝ (U+040D/U+045D) - Cyrillic I with grave

    • Purpose: Distinguishes homographs (e.g., ѝ "her" vs и "and")
    • Standard: ISO 9:1968/1995

Bulgarian:

  • Ѝ/ѝ (U+040D/U+045D) - Cyrillic I with grave
    • Purpose: Stress marking and homograph disambiguation (e.g., ѝ "her" vs и "and")
    • Standard: ISO 9:1995

Sources:

Usage Examples

Default behavior (accents stripped):

>>> import cyrtranslit
>>> cyrtranslit.to_latin("ѝ је", "mk")
"i je"
>>> cyrtranslit.to_latin("нѐ сме", "mk")
"ne sme"
>>> cyrtranslit.to_cyrillic("i je", "mk")
"и је"

With accents preserved:

>>> import cyrtranslit
>>> cyrtranslit.to_latin("ѝ је", "mk", preserve_accents=True)
"ì je"
>>> cyrtranslit.to_latin("нѐ сме", "mk", preserve_accents=True)
"nè sme"
>>> cyrtranslit.to_cyrillic("ì je", "mk", preserve_accents=True)
"ѝ је"
>>> cyrtranslit.to_cyrillic("nè sme", "mk", preserve_accents=True)
"нѐ сме"

Command-line usage:

# Default (accents stripped)
$ echo "ѝ је" | cyrtranslit -l mk
i je

# Preserve accents
$ echo "ѝ је" | cyrtranslit -l mk --preserve-accents
ì je

Command Line Interface

Sample command line call to transliterate a Russian text file:

$ cyrtranslit -l RU -i tests/ru.txt -o tests/output.txt

Use the -c argument to accomplish the reverse, that is to input latin characters and output cyrillic.

Use the -h argument for help.

You can also omit the input and output files and use standard input/output

$ echo 'Мој ховеркрафт је пун јегуља' | cyrtranslit -l sr
Moj hoverkraft je pun jegulja
$ echo 'Moj hoverkraft je pun jegulja' | cyrtranslit -l sr
Мој ховеркрафт је пун јегуља

File Encodings

By default, input files are expected to be UTF-8. For files with different encodings, use the -e/--encoding parameter:

$ cyrtranslit -l BG -i file.txt -e windows-1251

If no encoding is specified and encoding fails with the default UTF-8, then CyrTranslit automatically tries the following common Cyrillic encodings: windows-1251, iso-8859-5, koi8-r, and cp866.

Try CyrTranslit by running it directly on the Python command line interface, e.g.:

>>> import sys
>>> import cyrtranslit.cyrtranslit
>>> sys.argv.extend(['-l', 'UA'])
>>> sys.argv.extend(['-i', 'tests/ua.txt'])
>>> sys.argv.extend(['-o', 'tests/output.txt'])
>>> cyrtranslit.cyrtranslit.main()
>>> exit()

How can I contribute?

Include support for other Cyrillic script alphabets. Follow these steps in order to do so:

  1. Create a new transliteration mapping file in the mapping/ directory (using the language code as the filename, e.g., xx.py) and reference to it in the TRANSLIT_DICT dictionary in mapping/__init__.py. If the language uses accented characters (like Macedonian and Bulgarian), create separate accented dictionaries (e.g., XX_CYR_TO_LAT_ACCENTED_DICT) following the pattern in mk.py or bg.py.
  2. Watch out for cases where two consecutive Latin alphabet letters are meant to transliterate into a single Cyrillic script letter. These cases need to be explicitly checked for inside the to_cyrillic() function in __init__.py.
  3. Add test cases inside of tests.py.
  4. Add test CLI input files in the tests directory.
  5. Update the documentation in the README.md.
  6. List yourself as one of the contributors.

Before tagging a release version and deploying to PyPI:

  1. Update the version and download_url properties in setup.py.
  2. Reserve a Zenodo DOI for the release and update this readme's Zenodo badge and citation instructions.

A big thank you to everyone who contributed:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cyrtranslit-1.2.0.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cyrtranslit-1.2.0-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file cyrtranslit-1.2.0.tar.gz.

File metadata

  • Download URL: cyrtranslit-1.2.0.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cyrtranslit-1.2.0.tar.gz
Algorithm Hash digest
SHA256 cd3d2896b494f440a5fceaea42a39f6a87c95dd2344f7c957f42cfbb6bbe2036
MD5 325b776f3070a63f06567e1bb6132a72
BLAKE2b-256 c7d729f3e3fadab6b2aea3cf577c2f22d99569373c1de5398a6c8b69a663ea36

See more details on using hashes here.

File details

Details for the file cyrtranslit-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: cyrtranslit-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cyrtranslit-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cd01ebc8aa335cb73601bf3dbb49d158ccdbdebc9b100c0b589b170408b0fcb2
MD5 c893605e149f1565a974f1d84dd1a4b5
BLAKE2b-256 3beaee4c5843cf5ad2d619e09f75bf768b9b8508e54eab9885dccb3eea3bb580

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page