Indic-Xlit: Transliteration library for Indic Languages. Conversion of text from English to 21 languages of South Asia.

These details have not been verified by PyPI

Project links

Project description

AI4Bharat Indic-Transliteration

An AI-based transliteration engine for 21 major languages of the Indian subcontinent.

This package provides support for:

Python Library for transliteration from Roman to Native script
HTTP API server that can be hosted for interaction with web applications

About

This library is based on our research work called Indic-Xlit to build tools that can translit text between Indic languages and colloquially-typed content (in English alphabet). We support both Roman-to-Native back-transliteration (English script to Indic language conversion), as well as Native-to-Roman transliteration (Indic to English alphabet conversion).

An online demo is available here: https://xlit.ai4bharat.org

Languages Supported

ISO 639 code	Language
as	Assamese - অসমীয়া
bn	Bangla - বাংলা
brx	Boro - बड़ो
gu	Gujarati - ગુજરાતી
hi	Hindi - हिंदी
kn	Kannada - ಕನ್ನಡ
ks	Kashmiri - كٲشُر
gom	Konkani Goan - कोंकणी
mai	Maithili - मैथिली
ml	Malayalam - മലയാളം
mni	Manipuri - ꯃꯤꯇꯩꯂꯣꯟ
mr	Marathi - मराठी
ne	Nepali - नेपाली
or	Oriya - ଓଡ଼ିଆ
pa	Panjabi - ਪੰਜਾਬੀ
sa	Sanskrit - संस्कृतम्
sd	Sindhi - سنڌي
si	Sinhala - සිංහල
ta	Tamil - தமிழ்
te	Telugu - తెలుగు
ur	Urdu - اُردُو

Usage

Python Library

Import the wrapper for transliteration engine by:

from ai4bharat.transliteration import XlitEngine

Example 1 : Using word Transliteration

e = XlitEngine("hi", beam_width=10, rescore=True)
out = e.translit_word("namasthe", topk=5)
print(out)
# output: {'hi': ['नमस्ते', 'नमस्थे', 'नामस्थे', 'नमास्थे', 'नमस्थें']}

Arguments:

beam_width increases search size, resulting in improved accuracy but increases time/compute. (Default: 4)
topk returns only specified number of top results. (Default: 4)
rescore returns the reranked suggestions after using a dictionary. (Default: True)

Romanization:

By default, XlitEngine will load English-to-Indic model (default: src_script_type="roman")
To load Indic-to-English model, use src_script_type="indic"

For example: (also applicable for all other examples below)

e = XlitEngine(src_script_type="indic", beam_width=10, rescore=False)
out = e.translit_word("नमस्ते", lang_code="hi", topk=5)
print(out)
# output: ['namaste', 'namastey', 'namasthe', 'namastay', 'namste']

Example 2 : word Transliteration without rescoring

e = XlitEngine("hi", beam_width=10, rescore=False)
out = e.translit_word("namasthe", topk=5)
print(out)
# output: {'hi': ['नमस्थे', 'नामस्थे', 'नमास्थे', 'नमस्थें', 'नमस्ते']}

Example 3 : Using Sentence Transliteration

e = XlitEngine("ta", beam_width=10)
out = e.translit_sentence("vanakkam ulagam")
print(out)
# output: {'ta': 'வணக்கம் உலகம்'}

Note:

Only single top most prediction is returned for each word in sentence.

Example 4 : Using Multiple language Transliteration

e = XlitEngine(["ta", "ml"], beam_width=6)
# leave empty or use "all" to load all available languages
# e = XlitEngine("all)

out = e.translit_word("amma", topk=3)
print(out)
# output: {'ml': ['അമ്മ', 'എമ്മ', 'അമ'], 'ta': ['அம்மா', 'அம்ம', 'அம்மை']}

out = e.translit_sentence("vandhe maatharam")
print(out)
# output: {'ml': 'വന്ധേ മാതരം', 'ta': 'வந்தே மாதரம்'}

## Specify language name to get only specific language result
out = e.translit_word("amma", lang_code = "ml", topk=5)
print(out)
# output: ['അമ്മ', 'എമ്മ', 'അമ', 'എഎമ്മ', 'അഎമ്മ']

Example 5 : Transliteration for all available languages

e = XlitEngine(beam_width=10)
out = e.translit_sentence("namaskaar bharat")
print(out)
# sample output: {'bn': 'নমস্কার ভারত', 'gu': 'નમસ્કાર ભારત', 'hi': 'नमस्कार भारत', 'kn': 'ನಮಸ್ಕಾರ್ ಭಾರತ್', 'ml': 'നമസ്കാർ ഭാരത്', 'pa': 'ਨਮਸਕਾਰ ਭਾਰਤ', 'si': 'නමස්කාර් භාරත්', 'ta': 'நமஸ்கார் பாரத்', 'te': 'నమస్కార్ భారత్', 'ur': 'نمسکار بھارت'}

Web API Server

Running a flask server using a 3-line script:

from ai4bharat.transliteration import xlit_server
app, engine = xlit_server.get_app()
app.run(host='0.0.0.0', port=8000)

Then on browser (or) curl, use link as http://{IP-address}:{port}/tl/{lang-id}/{word_in_eng_script}

Example: http://localhost:8000/tl/ta/amma http://localhost:8000/languages

Debugging errors

If you face any of the following errors:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject ValueError: Please build (or rebuild) Cython components with python setup.py build_ext --inplace.

Run: pip install --upgrade numpy

Release Notes

This package contains applications built around the Transliteration engine. The contents of this package can also be downloaded from our GitHub repo.

All the NN models of Indic-Xlit are released under MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.3

Sep 14, 2022

1.1.2

Aug 19, 2022

1.1.1.2

Aug 19, 2022

1.1.1

Aug 15, 2022

1.1.0.1

Jul 26, 2022

1.1

Jul 26, 2022

1.0.0.2

Jul 3, 2022

1.0.0.1

Jun 22, 2022

1.0.0

Jun 17, 2022

0.5.0.3

Nov 10, 2020

0.5.0.2 yanked

Nov 10, 2020

0.5.0.1

Nov 10, 2020

0.5.0

Nov 10, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai4bharat-transliteration-1.1.3.tar.gz (29.2 kB view details)

Uploaded Sep 14, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai4bharat_transliteration-1.1.3-py3-none-any.whl (32.3 kB view details)

Uploaded Sep 14, 2022 Python 3

File details

Details for the file ai4bharat-transliteration-1.1.3.tar.gz.

File metadata

Download URL: ai4bharat-transliteration-1.1.3.tar.gz
Upload date: Sep 14, 2022
Size: 29.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.7.12

File hashes

Hashes for ai4bharat-transliteration-1.1.3.tar.gz
Algorithm	Hash digest
SHA256	`c4d72c75f1347e279a9be5292c863328a133d152a9c55d593ecfa892f6f4aea6`
MD5	`bc2021704e04fd4aeb6521c44120816c`
BLAKE2b-256	`4e1dae98752c60cdf9afb0317a7408a420434c6a790ec54f933ba62f6168f6d6`

See more details on using hashes here.

File details

Details for the file ai4bharat_transliteration-1.1.3-py3-none-any.whl.

File metadata

Download URL: ai4bharat_transliteration-1.1.3-py3-none-any.whl
Upload date: Sep 14, 2022
Size: 32.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.7.12

File hashes

Hashes for ai4bharat_transliteration-1.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`12bdbb613b12561878dffcf01636904ccc79d1940b34b046ee08dc1c9ec95ac4`
MD5	`5fad844c68b3d122fe54a0c3725acbbc`
BLAKE2b-256	`04349649b4fbc53fe7b038b70666b031f586d81f7b085e610cb9c3873cb5bd8e`

See more details on using hashes here.

ai4bharat-transliteration 1.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AI4Bharat Indic-Transliteration

About

Languages Supported

Usage

Python Library

Web API Server

Debugging errors

Release Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes