Script Conversion for Indo-Pakistani languages
Project description
Indic-PersoArabic-Script-Converter
Indo-Pakistani Transliteration
A python library to convert from Indian scripts to Pakistani scripts and vice-versa.
Currently supported methods
- Rule-based conversion
- Faster, but does not support short vowels
- Will not be accurate, especially for Arabic-to-Indic
- Uses an online endpoint for the conversion
- Produces much better results, but much slower
Usage
Installation
Pre-requisites:
- Use Python 3.7+
pip install git+https://github.com/GokulNC/indic_nlp_library
pip install indo-arabic-transliteration
Using rule-based conversion
from indo_arabic_transliteration.mapper import script_convert
script_convert(text: str, from_script: str, to_script: str)
Using Sangam API
from indo_arabic_transliteration.sangam_api import online_transliterate
online_transliterate(text: str, from_script: str, to_script: str)
Languages
We use the standard BCP 47 language tags to refer to the language-script combinations.
Hindi-Urdu (Hindustani)
Language | Script | Code |
---|---|---|
Hindi | Devanagari | hi-IN |
Urdu | Perso-Arabic | ur-PK |
Example:
# Rule-based
script_convert("हैदराबाद", 'hi-IN', 'ur-PK') # حیدرآباد
script_convert("حيدرآباد", 'ur-PK', 'hi-IN') # हीदराबाद
# Online-API
online_transliterate("حيدرآباد", 'ur-PK', 'hi-IN') # हैदराबाद
online_transliterate("हैदराबाद", 'hi-IN', 'ur-PK') # حیدرآباد
Notes & Resources:
- Both the nations share a common national language (Hindustani) but written in different scripts and also registered as different languages.
- Official Tools
- Devanagari to PersoArabic mapping
Panjabi
Language | Script | Code |
---|---|---|
East Punjabi | Gur'Mukhi | pa-IN |
West Punjabi | ShahMukhi | pa-PK |
Example:
# Rule-based
script_convert("ਸਿੰਘ", 'pa-IN', 'pa-PK') # سںگھ
script_convert("سںگھ", 'pa-PK', 'pa-IN') # ਸਂਘ
# Online-API
online_transliterate("سنگھ", 'pa-PK', 'pa-IN') # ਸਿੰਘ
online_transliterate("ਸਿੰਘ", 'pa-IN', 'pa-PK') # سِنگھ
Notes & Resources:
- You can also use these JavaScript libraries:
- Gurmukhi to Shahmukhi mapping
Sindhi
Language | Script | Code |
---|---|---|
Indian Sindhi | Devanagari | sd-IN |
Pakistani Sindhi | Perso-Arabic | sd-PK |
Example:
# Rule-based
script_convert("हैदराबाद", 'sd-IN', 'sd-PK') # حیدرآباد
script_convert("حيدرآباد", 'sd-PK', 'sd-IN') # हीदराबाद
# Online-API
online_transliterate("حيدرآباد", 'sd-PK', 'sd-IN') # हैदराबाद
online_transliterate("हैदराबाद", 'sd-IN', 'sd-PK') # حیدرآباد
Notes & Resources:
- Before Devanagari standardization, Sindhi was written in Landa scripts like Khojki, Khudawadi, Multani, Gurmukhi, etc. depending upon the region.
- To convert from Devanagari to the above legacy scripts, use AksharaMukha's python library.
- You can also use this JavaScript library or online converter.
- Sindhi-PersoArabic to Devanagari mapping
Other Methods
MachineLearning-based Transliteration
- Uses LibIndicTrans library for models
- Install it by
pip install git+https://github.com/libindic/indic-trans
- Install it by
- Currently supports only Hindi-Urdu languages
API:
from indo_arabic_transliteration.ml_based import ml_transliterate
# Same interface as script_convert()
Indic-to-Arabic with Diacritics
- Indic scripts are mostly phonetic. Use this to retain diacritics in PersoArabic
- Currently only supports Hindustani (Hindi to Urdu) and Punjabi (Gurmukhi to Shahmukhi)
- Uses AksharaMukhi library
API:
from indo_arabic_transliteration.lossless_converter import convert_with_diacritics
# Same interface as script_convert()
Support
- For help in using the library, please use the GitHub Issues section.
- For script conversion errors from the online API, please write directly to the Sangam team. We are not related to them in anyway and this is not an official library.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file indo-arabic-transliteration-0.1.5.tar.gz
.
File metadata
- Download URL: indo-arabic-transliteration-0.1.5.tar.gz
- Upload date:
- Size: 14.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f96ad2ae3e469cae127da0e1d8fa94a079a099cef9e432db2d85261bcca74ad4 |
|
MD5 | 9b283f36c451d553e976eaf434efd737 |
|
BLAKE2b-256 | f6c569be721a300c55c1824dff9d97f29757028dda5c29f452b4364d408c5885 |
File details
Details for the file indo_arabic_transliteration-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: indo_arabic_transliteration-0.1.5-py3-none-any.whl
- Upload date:
- Size: 20.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47645205ef49e808a55b629b967c25bcae23bfa2bc2d119b48d652497c8641e2 |
|
MD5 | cabd7b1ccbd5a03b977bee46f349100f |
|
BLAKE2b-256 | 12edc671e31ff041cf21ed4e31582b96f0016a40176bc39b981cd447cf9c41a1 |