Skip to main content

Indic-Xlit: Transliteration library for Indic Languages. Conversion of text from English to 21 languages of South Asia.

Project description

AI4Bharat Indic-Transliteration

An AI-based transliteration engine for 21 major languages of the Indian subcontinent.

This package provides support for:

  1. Python Library for transliteration from Roman to Native script
  2. HTTP API server that can be hosted for interaction with web applications

About

This library is based on our research work called Indic-Xlit to build tools that can translit text between Indic languages and colloquially-typed content (in English alphabet). We support both Roman-to-Native back-transliteration (English script to Indic language conversion), as well as Native-to-Roman transliteration (Indic to English alphabet conversion).

An online demo is available here: https://xlit.ai4bharat.org

Languages Supported

ISO 639 code Language
as Assamese - অসমীয়া
bn Bangla - বাংলা
brx Boro - बड़ो
gu Gujarati - ગુજરાતી
hi Hindi - हिंदी
kn Kannada - ಕನ್ನಡ
ks Kashmiri - كٲشُر
gom Konkani Goan - कोंकणी
mai Maithili - मैथिली
ml Malayalam - മലയാളം
mni Manipuri - ꯃꯤꯇꯩꯂꯣꯟ
mr Marathi - मराठी
ne Nepali - नेपाली
or Oriya - ଓଡ଼ିଆ
pa Panjabi - ਪੰਜਾਬੀ
sa Sanskrit - संस्कृतम्
sd Sindhi - سنڌي
si Sinhala - සිංහල
ta Tamil - தமிழ்
te Telugu - తెలుగు
ur Urdu - اُردُو

Usage

Python Library

Import the wrapper for transliteration engine by:

from ai4bharat.transliteration import XlitEngine

Example 1 : Using word Transliteration

e = XlitEngine("hi", beam_width=10, rescore=True)
out = e.translit_word("namasthe", topk=5)
print(out)
# output: {'hi': ['नमस्ते', 'नमस्थे', 'नामस्थे', 'नमास्थे', 'नमस्थें']}

Arguments:

  • beam_width increases search size, resulting in improved accuracy but increases time/compute. (Default: 4)
  • topk returns only specified number of top results. (Default: 4)
  • rescore returns the reranked suggestions after using a dictionary. (Default: True)

Romanization:

  • By default, XlitEngine will load English-to-Indic model (default: src_script_type="roman")
  • To load Indic-to-English model, use src_script_type="indic"

For example: (also applicable for all other examples below)

e = XlitEngine(src_script_type="indic", beam_width=10, rescore=False)
out = e.translit_word("नमस्ते", lang_code="hi", topk=5)
print(out)
# output: ['namaste', 'namastey', 'namasthe', 'namastay', 'namste']

Example 2 : word Transliteration without rescoring

e = XlitEngine("hi", beam_width=10, rescore=False)
out = e.translit_word("namasthe", topk=5)
print(out)
# output: {'hi': ['नमस्थे', 'नामस्थे', 'नमास्थे', 'नमस्थें', 'नमस्ते']}

Example 3 : Using Sentence Transliteration

e = XlitEngine("ta", beam_width=10)
out = e.translit_sentence("vanakkam ulagam")
print(out)
# output: {'ta': 'வணக்கம் உலகம்'}

Note:

  • Only single top most prediction is returned for each word in sentence.

Example 4 : Using Multiple language Transliteration

e = XlitEngine(["ta", "ml"], beam_width=6)
# leave empty or use "all" to load all available languages
# e = XlitEngine("all)

out = e.translit_word("amma", topk=3)
print(out)
# output: {'ml': ['അമ്മ', 'എമ്മ', 'അമ'], 'ta': ['அம்மா', 'அம்ம', 'அம்மை']}

out = e.translit_sentence("vandhe maatharam")
print(out)
# output: {'ml': 'വന്ധേ മാതരം', 'ta': 'வந்தே மாதரம்'}

## Specify language name to get only specific language result
out = e.translit_word("amma", lang_code = "ml", topk=5)
print(out)
# output: ['അമ്മ', 'എമ്മ', 'അമ', 'എഎമ്മ', 'അഎമ്മ']

Example 5 : Transliteration for all available languages

e = XlitEngine(beam_width=10)
out = e.translit_sentence("namaskaar bharat")
print(out)
# sample output: {'bn': 'নমস্কার ভারত', 'gu': 'નમસ્કાર ભારત', 'hi': 'नमस्कार भारत', 'kn': 'ನಮಸ್ಕಾರ್ ಭಾರತ್', 'ml': 'നമസ്കാർ ഭാരത്', 'pa': 'ਨਮਸਕਾਰ ਭਾਰਤ', 'si': 'නමස්කාර් භාරත්', 'ta': 'நமஸ்கார் பாரத்', 'te': 'నమస్కార్ భారత్', 'ur': 'نمسکار بھارت'}

Web API Server

Running a flask server using a 3-line script:

from ai4bharat.transliteration import xlit_server
app, engine = xlit_server.get_app()
app.run(host='0.0.0.0', port=8000)

Then on browser (or) curl, use link as http://{IP-address}:{port}/tl/{lang-id}/{word_in_eng_script}

Example: http://localhost:8000/tl/ta/amma http://localhost:8000/languages


Debugging errors

If you face any of the following errors:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject ValueError: Please build (or rebuild) Cython components with python setup.py build_ext --inplace.

Run: pip install --upgrade numpy


Release Notes

This package contains applications built around the Transliteration engine. The contents of this package can also be downloaded from our GitHub repo.

All the NN models of Indic-Xlit are released under MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ansutr-transliteration-1.1.3.tar.gz (30.4 kB view details)

Uploaded Source

Built Distribution

ansutr_transliteration-1.1.3-py3-none-any.whl (31.7 kB view details)

Uploaded Python 3

File details

Details for the file ansutr-transliteration-1.1.3.tar.gz.

File metadata

File hashes

Hashes for ansutr-transliteration-1.1.3.tar.gz
Algorithm Hash digest
SHA256 5109ace9d53900c8c4b241e79603522bdcfd9ed7060a05bbcc366836b187b3af
MD5 618ea0295900c0cf9a028511817cc970
BLAKE2b-256 8ba104613b94431f6e7598f99c500761a680507b8c500c504d668839dfeb4513

See more details on using hashes here.

File details

Details for the file ansutr_transliteration-1.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for ansutr_transliteration-1.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b6e0beae2a209b07370d21fc927b213c073d4d948542ecf98acea597cd0ff8d4
MD5 de57b57d93643400fdf763d60a636138
BLAKE2b-256 df9f6b5ba399971d30b269e13cb858ecf3c99eca177d5e1c3bc5c4d35fbcacd7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page