Skip to main content

A Python tool for spelling Thai

Project description

khanaa

Khanaa is a tool to make spelling Thai more convenient.

Installation

For Python >=3.7

pip install khanaa

Usage

Spelling

from khanaa import Kham

basic_example = {
    'onset': 'ก', # can be more than one (required)
    'vowel': 'อา', # include vowel with ย, ว coda ex. เอียว (required)
    'silent_before': '', # silent character before coda
    'coda': '', # don't put ย, ว here (put them together with vowel)
    'silent_after': '', # silent character after coda
    'tone': -1  # -1 not specific, 0 สามัญ, 1 เอก, 2 โท, 3 ตรี, 4 จัตวา
    }
kaa = Kham(**basic_example)
kaa.form
# => 'กา'

# ย, ว coda
lai = Kham(onset='ล', vowel='อาย')
lai.form
# => 'ลาย'

# onset cluster
steak = Kham(onset='สต', vowel='เอะ', coda='ก', tone=3)
steak.form
# => 'สเต๊ก'

# silent character
shin = Kham(onset='ฌ', vowel='อิ', coda='น', silent_after='สก')
shin.form
# => 'ฌินสก์'

# can be customised (ex. add phinthu)
sia = Kham(onset='ซย', vowel='อา', onset_style='phinthu')
sia.all_tone()
# => ['ซฺยา', 'สฺย่า', 'ซฺย่า', 'ซฺย้า', 'สฺยา']

# use short length for vowel
pai = Kham(onset='ป', vowel='อาย', vowel_length='short')
pai.form
# => 'ไป'

SpellWord was deprecated but not removed.

Getting information

from khanaa import Kham

kwang = Kham(onset='กว', vowel='อา', coda='ง', tone=2)
kwang.form
# => 'กว้าง'

# get main onset to derive the tone from
kwang.onset_main
# => 'ก'

# get onset class
kwang.onset_class
# => 'mid'

# get vowel length
kwang.vowel_length
# => 'long'

# get coda class
kwang.coda_class
# => 'alive'

# is it checked syllable? (คำตายหรือเปล่า?)
kwang.is_checked
# => False

# get realized tone
# (different from the input if the input is -1 or not possible)
kwang.tone_realized
# => 2

# is it using ห นำ?
kwang.use_leading_h
# => False

# has it changed to its pair consonant to convey the tone?
kwang.use_pair_onset
# => False

# get naive, rule-based IPA
kwang.ipa()
# => 'k w aː ŋ ˥˩'

# get everything
kwang.data
""" =>
{'all_tone': ['กวาง', 'กว่าง', 'กว้าง', 'กว๊าง', 'กว๋าง'],
 'coda': 'ง',
 'coda_class': 'alive',
 'form': 'กว้าง',
 'homophone': ['กว้าง'],
 'ipa': 'k w aː ŋ ˥˩',
 'is_checked': False,
 'is_donee_end': False,
 'is_donor_end': True,
 'is_donor_start': True,
 'is_possible_tone': True,
 'onset': 'กว',
 'onset_class': 'mid',
 'onset_index': -2,
 'onset_main': 'ก',
 'silent_after': '',
 'silent_before': '',
 'tone': 2,
 'tone_mark': '้',
 'tone_realized': 2,
 'use_leading_h': False,
 'use_pair_onset': False,
 'vowel': 'อา',
 'vowel_length': 'long'}
"""

Ambiguity

As Thai orthography can be ambiguous, we can use these methods to detect if the spelled word's boundary is ambiguous (so that we can do something such as putting dash between ambiguous syllables to clarify the pronunciation).

from khanaa import Kham

kwang = Kham(onset='กว', vowel='อา', coda='ง', tone=2)
kwang.form
# => 'กว้าง'

# Check if onset of the following word can be interpreted
# as this word's coda.
# ex. ตา will return true as ตา + กลม can be read either as
# ตา-กลม or ตาก-ลม
kwang.is_donee_end()
# => False

# Check if coda of this word can be interpreted as the
# following word onset.
# ex. ตาก will return true as ตาก + ลม can be read either as
# ตา-กลม or ตาก-ลม
kwang.is_donor_end()
# => True

# Check if onset of this word can be interpreted as coda
# of the preceding word.
# ex. กลม will return true as ตา + กลม can be read either as
# ตา-กลม or ตาก-ลม
kwang.is_donor_start()
# => True

In other words, if a word that returns true on is_donee_end() is followed by a word that returns true on is_donor_start(), there will be ambiguity (in theory), for example, ตา and กลม.

If a word that returns true on is_donor_end() is followed by any word, there will be possible ambiguity.

Homophone

from khanaa import Kham
khuu = Kham(onset='ค', vowel='อู', tone=2)
khuu.form
# => 'คู่'

khuu.homophone()
# => ['ขู้', 'ฃู้', 'คู่', 'ฅู่', 'ฆู่']

Others

Find all available consonants, vowels and true clusters in khanaa

from khanaa import find_letter_list

find_letter_list()

A experimental, basic method to turn text into Kham

from khanaa import Kham, spelling_decompose

sd = spelling_decompose("เขียน")
# the result can be None if the input cannot be parsed
sd
""" =>
{'data': {'coda': 'น',
          'onset': 'ข',
          'silent_after': '',
          'silent_before': '',
          'tone': 4,
          'vowel': 'เอีย'},
 'detail': {'leading_h': False,
            'onset_index': -1,
            'onset_main': 'ข',
            'tone_mark': '',
            'vowel_form': 'เ-ี+ย'},
 'pref': {}}
"""

khian = Kham(**sd['data'])
khian.form
# => 'เขียน'

khian.ipa()
# => 'kʰ iaː n ˩˩˦'

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

khanaa-0.1.1.tar.gz (34.3 kB view details)

Uploaded Source

Built Distribution

khanaa-0.1.1-py3-none-any.whl (34.9 kB view details)

Uploaded Python 3

File details

Details for the file khanaa-0.1.1.tar.gz.

File metadata

  • Download URL: khanaa-0.1.1.tar.gz
  • Upload date:
  • Size: 34.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for khanaa-0.1.1.tar.gz
Algorithm Hash digest
SHA256 de35c1e3c8d7c44ace30790064c4c44faef1f129414c03e5b9ac521aaa3586ec
MD5 84fd1351aff952ca992a78f846d630c8
BLAKE2b-256 1b7ab80b6245aff1eb483b151b6d26baaa78e66a32f9ed5deb9241fe5e9a8cfa

See more details on using hashes here.

File details

Details for the file khanaa-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: khanaa-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 34.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for khanaa-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4890fb7ec68aea46ca2206370105fd07b2647c622fbaee70dc2ebed269626ecb
MD5 a59ca62f6d49aecdf27a2970d292b30d
BLAKE2b-256 534c33847b5115dcd6361cd874cbf98f313e314e035d967169c18173111df92d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page