Skip to main content

Deep Transliteration Library for Indic Languages

Project description

AI4Bharat Transliteration Application

Indic Deep-Xlit Engine

A deep transliteration engine for major languages of the Indian sub-continent.

This package provides support for:

  1. Python Library for transliteration from Roman to Native text (using NN-based models)
  2. HTTP API exposing for interation with web applications

Languages Supported

ISO 639 code Language
bn Bengali
gom Konkani Goan
gu Gujarati
hi Hindi
kn Kannada
mai Maithili
ml Malayalam
mr Marathi
pa Punjabi (Eastern)
sd Sindhi (Western)
si Sinhala
ta Tamil
te Telugu
ur Urdu

Usage

Python Library

After installing, import the transliteration engine by:

from ai4bharat.transliteration import XlitEngine

Example 1 : Using word Transliteration

e = XlitEngine("hi")
out = e.translit_word("computer", topk=5, beam_width=10)
print(out)
# output:{'hi': ['कम्प्यूटर', 'कंप्यूटर', 'कम्पूटर', 'कम्पुटर', 'कम्प्युटर']}

Note:

  • beam_width increases beam search size, resulting in improved accuracy but increases time/compute.
  • topk returns only specified number of top results.

Example 2 : Using Sentence Transliteration

e = XlitEngine("ta")
out = e.translit_sentence("vanakkam ulagam !", beam_width=10)
print(out)
# output: {'ta': 'வணக்கம் உலகம் !'}

Note:

  • Only single top most prediction is returned for each word in sentence.

Example 3 : Using Multiple language Transliteration

e = XlitEngine(["ta", "ml"])
# leave empty or use "all" to load all available languages
# e = XlitEngine("all)

out = e.translit_word("amma", topk=5, beam_width=10)
print(out)
# {'ta': ['அம்மா', 'அம்ம', 'அம்மை', 'ஆம்மா', 'ம்மா'], 'ml': ['അമ്മ', 'എമ്മ', 'അമ', 'എഎമ്മ', 'അഎമ്മ']}

out = e.translit_sentence("hello world", beam_width=10)
print(out)
# output: {'ta': 'ஹலோ வார்ல்ட்', 'ml': 'ഹലോ വേൾഡ്'}

## Specify language name to get only specific language result
out = e.translit_word("amma", lang_code = "ml", topk=5, beam_width=10)
print(out)
# output: ['അമ്മ', 'എമ്മ', 'അമ', 'എഎമ്മ', 'അഎമ്മ']

Example 4 : Transliteration for all available languages

e = XlitEngine()
out = e.translit_sentence("Hello World", beam_width=10)
print(out)
# {'bn': 'হেল ওয়ার্ল্ড', 'gu': 'હેલો વર્લ્ડ', 'hi': 'हेलो वर्ल्ड', 'kn': 'ಹೆಲ್ಲೊ ವರ್ಲ್ಡ್', 'gom': 'हॅलो वर्ल्ड', 'mai': 'हेल्लो वर्ल्ड', 'ml': 'ഹലോ വേൾഡ്', 'mr': 'हेलो वर्ल्ड', 'pa': 'ਹੇਲੋ ਵਰਲਡ', 'sd': 'هيلو ورلد', 'si': 'හිලෝ වර්ල්ඩ්', 'ta': 'ஹலோ வார்ல்ட்', 'te': 'హల్లో వరల్డ్', 'ur': 'ہیلو وارڈ'}

Web API Server

Running a flask server in 3 lines:

from ai4bharat.transliteration import xlit_server
app, engine = xlit_server.get_app()
app.run(debug=True, host='0.0.0.0', port=8000)

You can also check the extended sample script as shown below:

  1. Make required modification in SSL paths in api_expose.py. By default set to local host and both http & https are enabled.

  2. Run the API expose code: $ sudo env PATH=$PATH python3 api_expose.py (Export GOOGLE_APPLICATION_CREDENTIALS if needed, by default functions realted to Google cloud is disabled.)

  3. In browser (or) curl, use link as http://{IP-address}:{port}/tl/{lang-id}/{word in eng script} If debug mode enabled port will be 8000, else port will be 80.

Example:
http://localhost:80/tl/ta/amma
http://localhost:80/languages


Release Notes

This package contains applications built around the Transliteration engine. The contents of this package can also be downloaded from latest GitHub release and sufficient for inference usage.

All the NN models (along with metadata) of Xlit - Transliteration are licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License][cc-by-sa].

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai4bharat-transliteration-0.5.0.3.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

ai4bharat_transliteration-0.5.0.3-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file ai4bharat-transliteration-0.5.0.3.tar.gz.

File metadata

  • Download URL: ai4bharat-transliteration-0.5.0.3.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for ai4bharat-transliteration-0.5.0.3.tar.gz
Algorithm Hash digest
SHA256 f413e25ba62c9a0f035efdeb192179289ffa813187f8e522d5925101cd846297
MD5 9034d067d044ed6bf255da3e44f9f0e0
BLAKE2b-256 27091bf1601cd1b2ae46da86a58abd2d7eca24024369f7c6d1015af13c335b08

See more details on using hashes here.

File details

Details for the file ai4bharat_transliteration-0.5.0.3-py3-none-any.whl.

File metadata

  • Download URL: ai4bharat_transliteration-0.5.0.3-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for ai4bharat_transliteration-0.5.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9a5cac56c6c0db2fa5ba6cb7c371b4503aa35fbada19bfd7693f2bd325e95360
MD5 ded2ed28d046b8b61b843e94513603cd
BLAKE2b-256 75a78c032930a643d4cea9357f4963cc4263b448c68d45742e425b6815d907d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page