Skip to main content

Script Conversion for Indo-Pakistani languages

Project description

Indic-PersoArabic-Script-Converter

Indo-Pakistani Transliteration

A python library to convert from Indian scripts to Pakistani scripts and vice-versa.

Currently supported methods

  1. Rule-based conversion
  • Faster, but does not support short vowels
  • Will not be accurate, especially for Arabic-to-Indic
  1. Sangam Project's online transliteration API
  • Uses an online endpoint for the conversion
  • Produces much better results, but much slower

Usage

Installation

Pre-requisites:

  • Use Python 3.7+
  • pip install git+https://github.com/GokulNC/indic_nlp_library
pip install indo-arabic-transliteration

Using rule-based conversion

from indo_arabic_transliteration.mapper import script_convert
script_convert(text: str, from_script: str, to_script: str)

Using Sangam API

from indo_arabic_transliteration.sangam_api import online_transliterate
online_transliterate(text: str, from_script: str, to_script: str)

Languages

We use the standard BCP 47 language tags to refer to the language-script combinations.

Hindi-Urdu (Hindustani)

Language Script Code
Hindi Devanagari hi-IN
Urdu Perso-Arabic ur-PK

Example:

# Rule-based
script_convert("हैदराबाद‎", 'hi-IN', 'ur-PK') # حیدرآباد
script_convert("حيدرآباد‎", 'ur-PK', 'hi-IN') # हीदराबाद‎

# Online-API
online_transliterate("حيدرآباد‎", 'ur-PK', 'hi-IN') # हैदराबाद‎
online_transliterate("हैदराबाद‎", 'hi-IN', 'ur-PK') # حیدرآباد‎

Notes & Resources:

Panjabi

Language Script Code
East Punjabi Gur'Mukhi pa-IN
West Punjabi ShahMukhi pa-PK

Example:

# Rule-based
script_convert("ਸਿੰਘ", 'pa-IN', 'pa-PK') # سںگھ
script_convert("سںگھ", 'pa-PK', 'pa-IN') # ਸਂਘ

# Online-API
online_transliterate("سنگھ", 'pa-PK', 'pa-IN') # ਸਿੰਘ
online_transliterate("ਸਿੰਘ", 'pa-IN', 'pa-PK') # سِنگھ

Notes & Resources:

Sindhi

Language Script Code
Indian Sindhi Devanagari sd-IN
Pakistani Sindhi Perso-Arabic sd-PK

Example:

# Rule-based
script_convert("हैदराबाद‎", 'sd-IN', 'sd-PK') # حیدرآباد
script_convert("حيدرآباد‎", 'sd-PK', 'sd-IN') # हीदराबाद‎

# Online-API
online_transliterate("حيدرآباد‎", 'sd-PK', 'sd-IN') # हैदराबाद‎
online_transliterate("हैदराबाद‎", 'sd-IN', 'sd-PK') # حیدرآباد‎

Notes & Resources:


Other Methods

MachineLearning-based Transliteration

  • Uses LibIndicTrans library for models
    • Install it by pip install git+https://github.com/libindic/indic-trans
  • Currently supports only Hindi-Urdu languages

API:

from indo_arabic_transliteration.ml_based import ml_transliterate
# Same interface as script_convert()

Indic-to-Arabic with Diacritics

  • Indic scripts are mostly phonetic. Use this to retain diacritics in PersoArabic
    • Currently only supports Hindustani (Hindi to Urdu) and Punjabi (Gurmukhi to Shahmukhi)
    • Uses AksharaMukhi library

API:

from indo_arabic_transliteration.lossless_converter import convert_with_diacritics
# Same interface as script_convert()

Support

  • For help in using the library, please use the GitHub Issues section.
  • For script conversion errors from the online API, please write directly to the Sangam team. We are not related to them in anyway and this is not an official library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indo-arabic-transliteration-0.1.5.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file indo-arabic-transliteration-0.1.5.tar.gz.

File metadata

  • Download URL: indo-arabic-transliteration-0.1.5.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for indo-arabic-transliteration-0.1.5.tar.gz
Algorithm Hash digest
SHA256 f96ad2ae3e469cae127da0e1d8fa94a079a099cef9e432db2d85261bcca74ad4
MD5 9b283f36c451d553e976eaf434efd737
BLAKE2b-256 f6c569be721a300c55c1824dff9d97f29757028dda5c29f452b4364d408c5885

See more details on using hashes here.

File details

Details for the file indo_arabic_transliteration-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: indo_arabic_transliteration-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for indo_arabic_transliteration-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 47645205ef49e808a55b629b967c25bcae23bfa2bc2d119b48d652497c8641e2
MD5 cabd7b1ccbd5a03b977bee46f349100f
BLAKE2b-256 12edc671e31ff041cf21ed4e31582b96f0016a40176bc39b981cd447cf9c41a1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page