Skip to main content

A package used to convert indic language to iast & iast to inidc langauge viceversa

Project description

BhashaFusion (PhoneticMap)

A package used to

  • convert indic language like hinid, kannada, malayalam,telugu,Odia, Bengali,Gujarathi to iast and
  • iast to Other inidc langauge(hinid, kannada, malayalam,telugu,Odia, Bengali,Gujarathi) viceversa.
  • Phonetic Word Search: Searching word which sound similar or Phoneticly Similar in different languages like hinid, kannada, malayalam,telugu,Odia,Bengali,Gujarathi...etc
  • Not Started: addon for bash regual expression function to include Phonetic Word Search for Indic Language
  • Inprogress: Mapping all Indic alphabets to International Phonetic Alphabet (IPA)
  • Not Started: Generated Phonetic Word given new indic word using Phonetic Mapped Alphabet
  • Not Started: Generating audio file for indic word
  • Not Started: Pull request of Postgress to include Phonetic Alphabet Search in IPA and IAST format

Installation

pip install BhashaFusion  

Usage

import sqlite3
import os
import sys

from BhashaFusion import PhoneticMap

#create a IAST object
phoneticmap = PhoneticMap()

# customization
# phoneticmap = PhoneticMap(db_path='iast-token.db', table_name_alpha='IndianAlphabet',table_name_barakadi='Barakhadi')

Converting All Indic language(hinid, telugu, kannada, Malayalam, Odia, Bengali&Assamese, Gujarati, tamil) to iast

InProgress Research and Analysis is going on in Tamil Script, Nastaliq Script, Sinhala Script.

phoneticmap.to_iast('''ଧୃତରାଷ୍ଟ୍ର ଉଵାଚ |\tধৃতরাষ্ট্র উবাচ |\tધૃતરાષ્ટ્ર ઉવાચ |\tத்றுதராஷ்ட்ர உவாச |''')
# >>> 
# dhr̥tarāṣṭra uvāca |	dhr̥tarāṣṭra ubāca |	dhr̥tarāṣṭra uvāca |	ta்ṟutarāṣa்ṭa்ra uvāca |

Convert iast to Indic Language

Currently this can convert IAST to kannada, hindi, telugu, malyalam

word = 'kaṁ  itāḥ kiṁ  yuyutsavaḥ kl̥̄ kl̥ pāṇḍavānīkaṁ itāḥ kiṁ āṁ  īṁ   yuyutsuṁ  kiṁ rānsakhīṁstathā'
print(PhoneticMap.iast2tokens( word) )
# >>> ['k', 'a', 'ṁ', '  ', 'i', 't', 'ā', 'ḥ', ' ', 'k', 'i', 'ṁ', '  ', 'y', 'u', 'y', 'u', 't', 's', 'a', 'v', 'aḥ', ' ', 'k', 'l̥̄', ' ', 'k', 'l̥', ' ', 'p', 'ā', 'ṇ', 'ḍ', 'a', 'v', 'ā', 'n', 'ī', 'k', 'a', 'ṁ', ' ', 'i', 't', 'ā', 'ḥ', ' ', 'k', 'i', 'ṁ', ' ', 'ā', 'ṁ', '  ', 'ī', 'ṁ', '   ', 'y', 'u', 'y', 'u', 't', 's', 'u', 'ṁ', '  ', 'k', 'i', 'ṁ', ' ', 'r', 'ā', 'n', 's', 'a', 'kh', 'ī', 'ṁ', 's', 't', 'a', 'th', 'ā']
indic_lang = 'Telugu' # 'Kannada' # 'Telugu', 'Odia', 'Gujarati', 'Bengali-Assamsese', 
# indic_lang='Devanagari'
# indic_lang='Kannada'
# indic_lang='Telugu'
# indic_lang='Odia'
# indic_lang='Bengali–Assamese'
# indic_lang='Tamil' # In development state

print(phoneticmap.iast2indic(word,indic_lang))



PhoneticMap.dict_tokens2indic(dict_tokene_list,halant)
# >>> కం  ఇతాః కిం  యుయుత్సవః  పాణ్డవానీకం ఇతాః కిం ఆం  ఈం  కిం యుయుత్సుం రాన్సఖీంస్తథా

Phonetic Hash for Phonetic Search

Presently there are two type of Phonetic Hash

  • BASIC HASHING :

    • all vowel are removed and
    • consonats are truncated
      eg: kh is mapped to k,
      श, ष, स are mapped to स (s)
      ङ,ञ,ण,न are mapped न (n)
      ..etc
  • NORMAL HASHING

    • vowel are truncated and
    • consonats are truncated
zero_vowels={ '':['a', "ā", "â","i", "ī","u", "ū",chr(805),chr(803),
                    "l̥", "l̥̄","e", "ē", "ê","o", "ō", "ô",
                    "ṁ", "m̐", "ṃ", "ṃ","n̆", "n̆", "n̆","ḥ" , "ḫ", "ẖ", "ḥ"],
                'r': ["r̥", "r̥̄"]
            } # replacing with r is not working for 'r̥' so we replace with chr(805) above
truncated_vowels = { '':[chr(805), chr(803), chr(772),chr(784),chr(774)],
                    'a':["ā", "â"], 
                    'i':["i", "ī"], 
                    'u':["u", "ū"], 
                    'r':["r̥", "r̥̄"],
                    'l':["l̥", "l̥̄"],
                    "e":["e", "ē", "ê"],
                            # "ai", 
                    "o": ["o", "ō", "ô"], 
                                            # "au",
                    'm' :["ṁ", "m̐", "ṃ", "ṃ"], 
                    'n': ["n̆", "n̆", "n̆"], 
                    'h' :["ḥ" , "ḫ", "ẖ", "ḥ"],
                    }
basic_truncated_consonat = {
                    'k' : ['ḵ', 'k', 'kh','k͟ha'],
                    'g' :['g','g̈','gh','ġ'],
                    'n' : ['ṅ','n̆','ñ','ṇ','ṉ'],
                    'c' : ['c', 'ĉ','ch','ĉh'],
                    'j' : ['j','ǰ', 'ĵ', 'jh'],
                    't': ['ṭ','ṭa','t','th','ṯ','t̤'],
                    'd' : ['ḍ', 'd̤','ḍ','ḍh','d','dh','ḏ'],
                    'p' : ['p', 'ph'],
                    'b' : [ 'b', 'b̤', 'bh'],
                    'm' : ['m̆' ],
                    'r' : ['ṟ', 'r̆'],
                    'l' :['ḻ', 'ḷ', 'l̤'],
                    'y' : ['y', 'ẏ', 'y̌'],
                    's': ['ś', 'ṣ', 's','s̱', 's̤','sh' ],
                    "z": ["z","ž","ž","ž",'ẕ','ẕ','ẓ','ż'],
                    'h' : ['h','h̤']
                    }
search_word = 'dhr̥tarāṣṭra uvāca'
search_word = search_word.strip().lower()



# to_iast
search_iast = phoneticmap.to_iast(search_word) # similar to idempotent matrx no loss of info if ':' not present
# >>> dhr̥tarāṣṭra uvāca
print(search_iast)

print("# Original Text:", search_word)
print('BASIC HASHING: ',PhoneticMap.basic_hash(search_iast))
print('NORMAL HASHING',PhoneticMap.normal_hash(search_iast))

# >>> Original Test: dhr̥tarāṣṭra uvāca
# >>> BASIC HASHING: drtrstr vc
# >>> NORMAL HASHING: drtarastra uvaca

Contribute

InProgress Research and Analysis is going on in Tamil Script, Nastaliq Script(Urdu, Arabic), Sinhala(Sri Lanka) Script.
Not Strated Marathi can easily added into this package

Issue

Please open an issue
here in case any bug was encountered. Mail id : dankarthik25@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BhashaFusion-0.0.1.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

BhashaFusion-0.0.1-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file BhashaFusion-0.0.1.tar.gz.

File metadata

  • Download URL: BhashaFusion-0.0.1.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for BhashaFusion-0.0.1.tar.gz
Algorithm Hash digest
SHA256 86db4fceabf80f797692dcd72248d4dbe8f9878dbb779b53d40a04c03a21bbe1
MD5 5ca41cf7803bebdf6e96e7323204fd6e
BLAKE2b-256 8a477243a04976cf871fb032d7fd0862d68492af51f4b0a0e19cdca1dd10acc0

See more details on using hashes here.

File details

Details for the file BhashaFusion-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: BhashaFusion-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for BhashaFusion-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5edb502236181123862e15cd5a1c5bbfe14a982d2a0854dc5679eaf417015c0f
MD5 21af0eed15c6c4308117db1327fff1f4
BLAKE2b-256 817f991b1017612d2ad2c323941492895d4554ac7946469059585dc864099c07

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page