Script Conversion for Indo-Pakistani languages
Project description
Indic-PersoArabic-Script-Converter
Indo-Pakistani Transliteration
A python library to convert from Indian scripts to Pakistani scripts and vice-versa.
Currently supported methods:
- Rule-based direct one-to-one mapping (does not support short vowels)
- Use this for simple raw conversion
- Will not be accurate, especially for Arabic-to-Indic
- Uses an online endpoint for the conversion
- Produces much better results, but much slower
Usage
Installation
Pre-requisites:
- Use Python 3.7+
pip install git+https://github.com/GokulNC/indic_nlp_library
pip install indo-arabic-transliteration
Using rule-based conversion
from indo_arabic_transliteration import script_convert
script_convert(text: str, from_script: str, to_script: str)
Using SANGAM API
from indo_arabic_transliteration import online_transliterate
online_transliterate(text: str, from_script: str, to_script: str)
Languages
We use the standard BCP 47 language tags to refer to the language-script combinations.
Hindustani
Language | Script | Code |
---|---|---|
Hindi | Devanagari | hi-IN |
Urdu | Perso-Arabic | ur-PK |
Example:
# Rule-based
script_convert("हैदराबाद", 'hi-IN', 'ur-PK') # حیدرآباد
script_convert("حيدرآباد", 'ur-PK', 'hi-IN') # हीदराबाद
# Online-API
online_transliterate("حيدرآباد", 'ur-PK', 'hi-IN') # हैदराबाद
online_transliterate("हैदराबाद", 'hi-IN', 'ur-PK') # حیدرآباد
Notes & Resources:
- Both the nations share a common national language (Hindustani) but written in different scripts and also registered as different languages.
- Official Tools
- For offline Hindi-Urdu transliteration using Python, use LibIndic-Trans.
- Devanagari to PersoArabic mapping
- Note: This same rule-based function can be used for Saraiki language also
Panjabi
Language | Script | Code |
---|---|---|
East Punjabi | Gur'Mukhi | pa-IN |
West Punjabi | ShahMukhi | pa-PK |
Example:
# Rule-based
script_convert("ਸਿੰਘ", 'pa-IN', 'pa-PK') # سںگھ
script_convert("سںگھ", 'pa-PK', 'pa-IN') # ਸਂਘ
# Online-API
online_transliterate("سنگھ", 'pa-PK', 'pa-IN') # ਸਿੰਘ
online_transliterate("ਸਿੰਘ", 'pa-IN', 'pa-PK') # سِنگھ
Notes & Resources:
- You can also use these JavaScript libraries:
- Gurmukhi to Shahmukhi mapping
Sindhi
Language | Script | Code |
---|---|---|
Indian Sindhi | Devanagari | sd-IN |
Pakistani Sindhi | Perso-Arabic | sd-PK |
Example:
# Rule-based
script_convert("हैदराबाद", 'sd-IN', 'sd-PK') # حیدرآباد
script_convert("حيدرآباد", 'sd-PK', 'sd-IN') # हीदराबाद
# Online-API
online_transliterate("حيدرآباد", 'sd-PK', 'sd-IN') # हैदराबाद
online_transliterate("हैदराबाद", 'sd-IN', 'sd-PK') # حیدرآباد
Notes & Resources:
- Before Devanagari standardization, Sindhi was written in Landa scripts like Khojki, Khudawadi, Multani, Gurmukhi, etc. depending upon the region.
- To convert from Devanagari to the above legacy scripts, use AksharaMukha's python library.
- You can also use this JavaScript library or online converter.
- PersoArabic to Devanagari mapping
Support
- For help in using the library, please use the GitHub Issues section.
- For script conversion errors from the online API, please write directly to the SANGAM team. We are not related to them in anyway and this is not an official library.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for indo-arabic-transliteration-0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b845915e41a98f39bb9f3661874c82ec6f90ea5b725d2d031038af3b4b74e90e |
|
MD5 | 67e61fda4af2d4c6bcc7414a8c92bc23 |
|
BLAKE2b-256 | 9981fc3d3f02bd58e905723abea9e6302975e423492ffc992b86495385577915 |
Close
Hashes for indo_arabic_transliteration-0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf05d672b2a0dbd7fd8b783ba063ae7ff15f27a600d69968692f0fffe98448a4 |
|
MD5 | 9f63d72c74da89af96d68b3b3fae861a |
|
BLAKE2b-256 | 216c404ec5e26cb39d71f027e1ff42a338cc647e37625f3bd778f67991f9ebaa |