Indic-Xlit: Transliteration library for Indic Languages. Conversion of text from English to 21 languages of South Asia.
Project description
AI4Bharat Transliteration Application
An AI-based transliteration engine for 21 major languages of the Indian subcontinent.
This package provides support for:
- Python Library for transliteration from Roman to Native script
- HTTP API server that can be hosted for interaction with web applications
About
This library is based on our research work called Indic-Xlit to build tools that can translit text to Indic languages from colloquially-typed content (in English alphabet), precisely called as Roman-to-Native back-transliteration. Note that currently we do not support Indic to English conversion (Native-to-Roman transliteration).
- Example Input:
namaste bhai
- Example Output:
नमस्ते भाई
An online demo is available here: https://xlit.ai4bharat.org
Languages Supported
ISO 639 code | Language |
---|---|
as | Assamese - অসমীয়া |
bn | Bengali - বাংলা |
brx | Boro - बड़ो |
gu | Gujarati - ગુજરાતી |
hi | Hindi - हिंदी |
kn | Kannada - ಕನ್ನಡ |
ks | Kashmiri - كٲشُر |
gom | Konkani Goan - कोंकणी |
mai | Maithili - मैथिली |
ml | Malayalam - മലയാളം |
mni | Manipuri - ꯃꯤꯇꯩꯂꯣꯟ |
mr | Marathi - मराठी |
ne | Nepali - नेपाली |
or | Oriya - ଓଡ଼ିଆ |
pa | Panjabi - ਪੰਜਾਬੀ |
sa | Sanskrit - संस्कृतम् |
sd | Sindhi - سنڌي |
si | Sinhala - සිංහල |
ta | Tamil - தமிழ் |
te | Telugu - తెలుగు |
ur | Urdu - اُردُو |
Usage
Python Library
Import the wrapper for transliteration engine by:
from ai4bharat.transliteration import XlitEngine
Example 1 : Using word Transliteration
e = XlitEngine("hi", beam_width=10, rescore=True)
out = e.translit_word("computer", topk=5)
print(out)
# output:{'hi': ['कंप्यूटर', 'कम्प्यूटर', 'कॉम्प्यूटर', 'कम्प्युटर', 'कंप्युटर']}
Note:
beam_width
increases beam search size, resulting in improved accuracy but increases time/compute. (Default:4
)topk
returns only specified number of top results. (Default:4
)rescore
returns the reranked suggestions after using a dictionary. (Default:True
)
Example 2 : word Transliteration without rescoring
e = XlitEngine("hi", beam_width=10, rescore=False)
out = e.translit_word("computer", topk=5)
print(out)
# output:{'hi': ['कम्प्यूटर', 'कंप्यूटर', 'कॉम्प्यूटर', 'कम्प्युटर', 'कंप्युटर']}
Example 3 : Using Sentence Transliteration
e = XlitEngine("ta", beam_width=10)
out = e.translit_sentence("vanakkam ulagam")
print(out)
# output: {'ta': 'வணக்கம் உலகம்'}
Note:
- Only single top most prediction is returned for each word in sentence.
Example 4 : Using Multiple language Transliteration
e = XlitEngine(["ta", "ml"], beam_width=6)
# leave empty or use "all" to load all available languages
# e = XlitEngine("all)
out = e.translit_word("amma", topk=3)
print(out)
# output: {'ta': ['அம்மா', 'அம்ம', 'அம்மை'], 'ml': ['അമ്മ', 'എമ്മ', 'അമ']}
out = e.translit_sentence("hello world")
print(out)
# output: {'ta': 'ஹலோ வார்ல்ட்', 'ml': 'ഹലോ വേൾഡ്'}
## Specify language name to get only specific language result
out = e.translit_word("amma", target_lang = "ml", topk=5)
print(out)
# output: ['അമ്മ', 'എമ്മ', 'അമ', 'എഎമ്മ', 'അഎമ്മ']
Example 5 : Transliteration for all available languages
e = XlitEngine(beam_width=10)
out = e.translit_sentence("Hello World")
print(out)
# sample output: {'bn': 'হেল ওয়ার্ল্ড', 'gu': 'હેલો વર્લ્ડ', 'hi': 'हेलो वर्ल्ड', 'kn': 'ಹೆಲ್ಲೊ ವರ್ಲ್ಡ್', 'ml': 'ഹലോ വേൾഡ്', 'pa': 'ਹੇਲੋ ਵਰਲਡ', 'si': 'හිලෝ වර්ල්ඩ්', 'ta': 'ஹலோ வார்ல்ட்', 'te': 'హల్లో వరల్డ్', 'ur': 'ہیلو وارڈ'}
Web API Server
Running a flask server using a 3-line script:
from ai4bharat.transliteration import xlit_server
app, engine = xlit_server.get_app()
app.run(host='0.0.0.0', port=8000)
Then on browser (or) curl, use link as http://{IP-address}:{port}/tl/{lang-id}/{word_in_eng_script}
Example: http://localhost:8000/tl/ta/amma http://localhost:8000/languages
Debugging errors
If you face any of the following errors:
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject ValueError: Please build (or rebuild) Cython components with
python setup.py build_ext --inplace
.
Run: pip install --upgrade numpy
Release Notes
This package contains applications built around the Transliteration engine. The contents of this package can also be downloaded from our GitHub repo.
All the NN models of Indic-Xlit are released under MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ai4bharat-transliteration-1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e47ab1ed2b8921d7762e28ce4f98e579407435ab774135d2883e133863b5d09 |
|
MD5 | 892597be3596698c28763862e798aa0b |
|
BLAKE2b-256 | cfa3d1c024db5699cd198163466024572d2133d5d76ccacac9bd279dcdaa0309 |
Hashes for ai4bharat_transliteration-1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fa34f152563499567d28d66fe7dcb6203e4180a91fe8f45b80d2ca92e3bc09e |
|
MD5 | b12d6e03095546f5622ac7436450f281 |
|
BLAKE2b-256 | 9c9ae12fbf4c3aa37b7db591aa7b1693f4ea80a28742170c465d05ffc1408a03 |