Skip to main content

Indic-Xlit: Transliteration library for Indic Languages. Conversion of text from English to 21 languages of South Asia.

Project description

AI4Bharat Transliteration Application

An AI-based transliteration engine for 21 major languages of the Indian subcontinent.

This package provides support for:

  1. Python Library for transliteration from Roman to Native script
  2. HTTP API server that can be hosted for interaction with web applications

About

This library is based on our research work called Indic-Xlit to build tools that can translit text to Indic languages from colloquially-typed content (in English alphabet), precisely called as Roman-to-Native back-transliteration. Note that currently we do not support Indic to English conversion (Native-to-Roman transliteration).

  • Example Input: namaste bhai
  • Example Output: नमस्ते भाई

An online demo is available here: https://xlit.ai4bharat.org

Languages Supported

ISO 639 code Language
as Assamese - অসমীয়া
bn Bengali - বাংলা
brx Boro - बड़ो
gu Gujarati - ગુજરાતી
hi Hindi - हिंदी
kn Kannada - ಕನ್ನಡ
ks Kashmiri - كٲشُر
gom Konkani Goan - कोंकणी
mai Maithili - मैथिली
ml Malayalam - മലയാളം
mni Manipuri - ꯃꯤꯇꯩꯂꯣꯟ
mr Marathi - मराठी
ne Nepali - नेपाली
or Oriya - ଓଡ଼ିଆ
pa Panjabi - ਪੰਜਾਬੀ
sa Sanskrit - संस्कृतम्
sd Sindhi - سنڌي
si Sinhala - සිංහල
ta Tamil - தமிழ்
te Telugu - తెలుగు
ur Urdu - اُردُو

Usage

Python Library

Import the wrapper for transliteration engine by:

from ai4bharat.transliteration import XlitEngine

Example 1 : Using word Transliteration

e = XlitEngine("hi", beam_width=10, rescore=True)
out = e.translit_word("computer", topk=5)
print(out)
# output:{'hi': ['कंप्यूटर', 'कम्प्यूटर', 'कॉम्प्यूटर', 'कम्प्युटर', 'कंप्युटर']}

Note:

  • beam_width increases beam search size, resulting in improved accuracy but increases time/compute. (Default: 4)
  • topk returns only specified number of top results. (Default: 4)
  • rescore returns the reranked suggestions after using a dictionary. (Default: True)

Example 2 : word Transliteration without rescoring

e = XlitEngine("hi", beam_width=10, rescore=False)
out = e.translit_word("computer", topk=5)
print(out)
# output:{'hi': ['कम्प्यूटर', 'कंप्यूटर', 'कॉम्प्यूटर', 'कम्प्युटर', 'कंप्युटर']}

Example 3 : Using Sentence Transliteration

e = XlitEngine("ta", beam_width=10)
out = e.translit_sentence("vanakkam ulagam")
print(out)
# output: {'ta': 'வணக்கம் உலகம்'}

Note:

  • Only single top most prediction is returned for each word in sentence.

Example 4 : Using Multiple language Transliteration

e = XlitEngine(["ta", "ml"], beam_width=6)
# leave empty or use "all" to load all available languages
# e = XlitEngine("all)

out = e.translit_word("amma", topk=3)
print(out)
# output: {'ta': ['அம்மா', 'அம்ம', 'அம்மை'], 'ml': ['അമ്മ', 'എമ്മ', 'അമ']}

out = e.translit_sentence("hello world")
print(out)
# output: {'ta': 'ஹலோ வார்ல்ட்', 'ml': 'ഹലോ വേൾഡ്'}

## Specify language name to get only specific language result
out = e.translit_word("amma", target_lang = "ml", topk=5)
print(out)
# output: ['അമ്മ', 'എമ്മ', 'അമ', 'എഎമ്മ', 'അഎമ്മ']

Example 5 : Transliteration for all available languages

e = XlitEngine(beam_width=10)
out = e.translit_sentence("Hello World")
print(out)
# sample output: {'bn': 'হেল ওয়ার্ল্ড', 'gu': 'હેલો વર્લ્ડ', 'hi': 'हेलो वर्ल्ड', 'kn': 'ಹೆಲ್ಲೊ ವರ್ಲ್ಡ್', 'ml': 'ഹലോ വേൾഡ്', 'pa': 'ਹੇਲੋ ਵਰਲਡ', 'si': 'හිලෝ වර්ල්ඩ්', 'ta': 'ஹலோ வார்ல்ட்', 'te': 'హల్లో వరల్డ్', 'ur': 'ہیلو وارڈ'}

Web API Server

Running a flask server using a 3-line script:

from ai4bharat.transliteration import xlit_server
app, engine = xlit_server.get_app()
app.run(host='0.0.0.0', port=8000)

Then on browser (or) curl, use link as http://{IP-address}:{port}/tl/{lang-id}/{word_in_eng_script}

Example: http://localhost:8000/tl/ta/amma http://localhost:8000/languages


Debugging errors

If you face any of the following errors:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject ValueError: Please build (or rebuild) Cython components with python setup.py build_ext --inplace.

Run: pip install --upgrade numpy


Release Notes

This package contains applications built around the Transliteration engine. The contents of this package can also be downloaded from our GitHub repo.

All the NN models of Indic-Xlit are released under MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai4bharat-transliteration-1.0.0.2.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai4bharat_transliteration-1.0.0.2-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file ai4bharat-transliteration-1.0.0.2.tar.gz.

File metadata

File hashes

Hashes for ai4bharat-transliteration-1.0.0.2.tar.gz
Algorithm Hash digest
SHA256 e38bc32d8b06d99b87f1ad5ecb8c0b28c9530dcb42fa92c54cf8e37edd5774d9
MD5 6b210cdbd30c657edeb3496d75b2ba58
BLAKE2b-256 b3c0732aa76c2a3e462b68ecd0b78af5cd28f510b5156d0fcc3e23e3fde1ae82

See more details on using hashes here.

File details

Details for the file ai4bharat_transliteration-1.0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for ai4bharat_transliteration-1.0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 166884e0fffc67fd33f7ae12ae11e3e7a70b25bcdff467357404d9c05fa9cc2d
MD5 819842df832e79b15a25da3e14eeb583
BLAKE2b-256 4840a9ff72af8b256f48e1a17bbd68b41ee5922c2fdafd2bf85544922bfb2be2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page