Deep Transliteration for Indic Languages
Project description
AI4Bharat Transliteration Application
Deep Indic Xlit Engine
A deep transliteration engine for major languages of the Indian sub-continent.
This package provides support for:
- Python Library for transliteration from Roman to Native text (using NN-based models)
- HTTP API exposing for interation with web applications
Languages Supported
ISO 639 code | Language |
---|---|
bn | Bengali |
gom | Konkani Goan |
gu | Gujarati |
hi | Hindi |
kn | Kannada |
mai | Maithili |
ml | Malayalam |
mr | Marathi |
pa | Punjabi (Eastern) |
sd | Sindhi (Western) |
si | Sinhala |
ta | Tamil |
te | Telugu |
ur | Urdu |
Usage
Python Library
Import the transliteration engine by:
from ai4bharat.transliteration import XlitEngine
Example 1 : Using word Transliteration
e = XlitEngine("hi")
out = e.translit_word("aam", topk=5, beam_width=10)
print(out)
# output:{'hi': ['कम्प्यूटर', 'कंप्यूटर', 'कम्पूटर', 'कम्पुटर', 'कम्प्युटर']}
Note:
beam_width
increases beam search size, resulting in improved accuracy but increases time/compute.topk
returns only specified number of top results.
Example 2 : Using Sentence Transliteration
e = XlitEngine("ta")
out = e.translit_sentence("vanakkam ulagam !", beam_width=10)
print(out)
# output: {'ta': 'வணக்கம் உலகம் !'}
Note:
- Only single top most prediction is returned for each word in sentence.
Example 3 : Using Multiple language Transliteration
e = XlitEngine(["ta", "ml"])
# leave empty or use "all" to load all available languages
# e = XlitEngine("all)
out = e.translit_word("amma", topk=5, beam_width=10)
print(out)
# {'ta': ['அம்மா', 'அம்ம', 'அம்மை', 'ஆம்மா', 'ம்மா'], 'ml': ['അമ്മ', 'എമ്മ', 'അമ', 'എഎമ്മ', 'അഎമ്മ']}
out = e.translit_sentence("hello world", beam_width=10)
print(out)
# output: {'ta': 'ஹலோ வார்ல்ட்', 'ml': 'ഹലോ വേൾഡ്'}
## Specify language name to get only specific language result
out = e.translit_word("amma", lang_code = "ml", topk=5, beam_width=10)
print(out)
# output: ['അമ്മ', 'എമ്മ', 'അമ', 'എഎമ്മ', 'അഎമ്മ']
Web API Server
-
Make required modification in SSL paths in
api_expose.py
. By default set to local host and both http & https are enabled. -
Run the API expose code:
$ sudo env PATH=$PATH python3 api_expose.py
(ExportGOOGLE_APPLICATION_CREDENTIALS
if needed, by default functions realted to Google cloud is disabled.) -
In browser (or) curl, use link as http://{IP-address}:{port}/tl/{lang-id}/{word in eng script}
If debug mode enabled port will be 8000, else port will be 80.
Example:
http://localhost:80/tl/ta/amma
http://localhost:80/languages
Release Notes
This package contains applications built around the Transliteration engine. The contents of this package can also be downloaded from latest GitHub release is sufficient for inference usage.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ai4bharat-transliteration-0.5.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c273bd39dd1b39a5d9409ef88f0b0f9b6594c28ce9403d8c780d0f96170b0ff |
|
MD5 | 47612c91aed3f4254a6bfff41bd5e450 |
|
BLAKE2b-256 | ad8e2d7112ba9b4356ede58ec0d25f70ecadc9c5124d069500ccd9c4b4f508be |
Hashes for ai4bharat_transliteration-0.5.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50d383b4007d228b76ced5f1ee9dcfdb6f2d840d157c2c923ccf5068c0f78df4 |
|
MD5 | 0b705480b11b19278d132d9ab2cbac1a |
|
BLAKE2b-256 | bcdf0747d729b76455bb2cb9f7bd397ff9e32cb0f4bfaf630035667c6275ffa8 |