Skip to main content

Script Conversion for Indo-Pakistani languages

Project description

Indic-PersoArabic-Script-Converter

Indo-Pakistani Transliteration

A python library to convert from Indian scripts to Pakistani scripts and vice-versa.

Currently supported methods

  1. Rule-based conversion
  • Faster, but does not support short vowels
  • Will not be accurate, especially for Arabic-to-Indic
  1. Sangam Project's online transliteration API
  • Uses an online endpoint for the conversion
  • Produces much better results, but much slower

Usage

Installation

Pre-requisites:

  • Use Python 3.7+
  • pip install git+https://github.com/GokulNC/indic_nlp_library
pip install indo-arabic-transliteration

Using rule-based conversion

from indo_arabic_transliteration.mapper import script_convert
script_convert(text: str, from_script: str, to_script: str)

Using Sangam API

from indo_arabic_transliteration.sangam_api import online_transliterate
online_transliterate(text: str, from_script: str, to_script: str)

Languages

We use the standard BCP 47 language tags to refer to the language-script combinations.

Hindi-Urdu (Hindustani)

Language Script Code
Hindi Devanagari hi-IN
Urdu Perso-Arabic ur-PK

Example:

# Rule-based
script_convert("हैदराबाद‎", 'hi-IN', 'ur-PK') # حیدرآباد
script_convert("حيدرآباد‎", 'ur-PK', 'hi-IN') # हीदराबाद‎

# Online-API
online_transliterate("حيدرآباد‎", 'ur-PK', 'hi-IN') # हैदराबाद‎
online_transliterate("हैदराबाद‎", 'hi-IN', 'ur-PK') # حیدرآباد‎

Notes & Resources:

Panjabi

Language Script Code
East Punjabi Gur'Mukhi pa-IN
West Punjabi ShahMukhi pa-PK

Example:

# Rule-based
script_convert("ਸਿੰਘ", 'pa-IN', 'pa-PK') # سںگھ
script_convert("سںگھ", 'pa-PK', 'pa-IN') # ਸਂਘ

# Online-API
online_transliterate("سنگھ", 'pa-PK', 'pa-IN') # ਸਿੰਘ
online_transliterate("ਸਿੰਘ", 'pa-IN', 'pa-PK') # سِنگھ

Notes & Resources:

Sindhi

Language Script Code
Indian Sindhi Devanagari sd-IN
Pakistani Sindhi Perso-Arabic sd-PK

Example:

# Rule-based
script_convert("हैदराबाद‎", 'sd-IN', 'sd-PK') # حیدرآباد
script_convert("حيدرآباد‎", 'sd-PK', 'sd-IN') # हीदराबाद‎

# Online-API
online_transliterate("حيدرآباد‎", 'sd-PK', 'sd-IN') # हैदराबाद‎
online_transliterate("हैदराबाद‎", 'sd-IN', 'sd-PK') # حیدرآباد‎

Notes & Resources:


Other Methods

MachineLearning-based Transliteration

  • Uses LibIndicTrans library for models
    • Install it by pip install git+https://github.com/libindic/indic-trans
  • Currently supports only Hindi-Urdu languages

API:

from indo_arabic_transliteration.ml_based import ml_transliterate
# Same interface as script_convert()

Indic-to-Arabic with Diacritics

  • Indic scripts are mostly phonetic. Use this to retain diacritics in PersoArabic
    • Currently only supports Hindustani (Hindi to Urdu) and Punjabi (Gurmukhi to Shahmukhi)
    • Uses AksharaMukhi library

API:

from indo_arabic_transliteration.lossless_converter import convert_with_diacritics
# Same interface as script_convert()

Support

  • For help in using the library, please use the GitHub Issues section.
  • For script conversion errors from the online API, please write directly to the Sangam team. We are not related to them in anyway and this is not an official library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indo-arabic-transliteration-0.1.5.tar.gz (14.5 kB view hashes)

Uploaded Source

Built Distribution

indo_arabic_transliteration-0.1.5-py3-none-any.whl (20.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page