Skip to main content

Deep Transliteration for Indic Languages

Project description

AI4Bharat Transliteration Application

Deep Indic Xlit Engine

A deep transliteration engine for major languages of the Indian sub-continent.

This package provides support for:

  1. Python Library for transliteration from Roman to Native text (using NN-based models)
  2. HTTP API exposing for interation with web applications

Languages Supported

ISO 639 code Language
bn Bengali
gom Konkani Goan
gu Gujarati
hi Hindi
kn Kannada
mai Maithili
ml Malayalam
mr Marathi
pa Punjabi (Eastern)
sd Sindhi (Western)
si Sinhala
ta Tamil
te Telugu
ur Urdu

Usage

Python Library

Import the transliteration engine by:

from ai4bharat.transliteration import XlitEngine

Example 1 : Using word Transliteration

e = XlitEngine("hi")
out = e.translit_word("aam", topk=5, beam_width=10)
print(out)
# output:{'hi': ['कम्प्यूटर', 'कंप्यूटर', 'कम्पूटर', 'कम्पुटर', 'कम्प्युटर']}

Note:

  • beam_width increases beam search size, resulting in improved accuracy but increases time/compute.
  • topk returns only specified number of top results.

Example 2 : Using Sentence Transliteration

e = XlitEngine("ta")
out = e.translit_sentence("vanakkam ulagam !", beam_width=10)
print(out)
# output: {'ta': 'வணக்கம் உலகம் !'}

Note:

  • Only single top most prediction is returned for each word in sentence.

Example 3 : Using Multiple language Transliteration

e = XlitEngine(["ta", "ml"])
# leave empty or use "all" to load all available languages
# e = XlitEngine("all)

out = e.translit_word("amma", topk=5, beam_width=10)
print(out)
# {'ta': ['அம்மா', 'அம்ம', 'அம்மை', 'ஆம்மா', 'ம்மா'], 'ml': ['അമ്മ', 'എമ്മ', 'അമ', 'എഎമ്മ', 'അഎമ്മ']}

out = e.translit_sentence("hello world", beam_width=10)
print(out)
# output: {'ta': 'ஹலோ வார்ல்ட்', 'ml': 'ഹലോ വേൾഡ്'}

## Specify language name to get only specific language result
out = e.translit_word("amma", lang_code = "ml", topk=5, beam_width=10)
print(out)
# output: ['അമ്മ', 'എമ്മ', 'അമ', 'എഎമ്മ', 'അഎമ്മ']

Web API Server

  1. Make required modification in SSL paths in api_expose.py. By default set to local host and both http & https are enabled.

  2. Run the API expose code:
    $ sudo env PATH=$PATH python3 api_expose.py
    (Export GOOGLE_APPLICATION_CREDENTIALS if needed, by default functions realted to Google cloud is disabled.)

  3. In browser (or) curl, use link as http://{IP-address}:{port}/tl/{lang-id}/{word in eng script}
    If debug mode enabled port will be 8000, else port will be 80.

Example:
http://localhost:80/tl/ta/amma
http://localhost:80/languages


Release Notes

This package contains applications built around the Transliteration engine. The contents of this package can also be downloaded from latest GitHub release is sufficient for inference usage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai4bharat-transliteration-0.5.0.tar.gz (15.0 kB view hashes)

Uploaded Source

Built Distribution

ai4bharat_transliteration-0.5.0-py3-none-any.whl (14.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page