Skip to main content

Subcharacter-level regular expression functionality for Korean

Project description

kre

Subcharacter-level regular expressions with Korean text.

kre is a wrapper for re from the Python Standard Library which allows users to apply the full functionality of re at the subcharacter level for Korean text.

Installation

kre releases are available on PyPI.

pip install kre

Documentation

Most functionality is documented in the re documentation.

Documentation on the unique features of kre is available in the wiki, where you will also find discussion of inherent differences between re (character-level regular expressions) and kre (subcharacter-level regular expressions) and how kre addresses them. It is strongly recommended that users familiarize themselves with these differences.

Example Features

In the simple case of search functions, matches are mapped back to their original position.

> re.search(r"ㅡ", "한글") # no match
> kre.search(r"ㅡ", "한글")
<kre.KRE_Match object; span=(1, 2), match='글'>

In the case of subcharacter-level substitutions, kre can recombine any newly created sequences into standard Korean characters, provided the input used standard (syllable) characters.

> kre.sub(r"ㅏ", r"ㅗ", "핳ㅏ하ㅎㅏ하핳")
'홓ㅗ호ㅎㅗ호홓'

If you prefer, kre can also attempt to merge non-standard input with substitutions.

> kre.sub(r"ㅏ", r"ㅗ", "핳ㅏ하ㅎㅏ하핳", syllabify="extended")
'호호호호호홓'

Although linearizing a Korean string normally results in the loss of information about syllable boundaries, kre makes it possible to make use of syllable boundaries in regular expression patterns through the use of (customizable) syllable delimiters (';' by default).

> kre.search(r"ㅇ", "생일 축하해~")
<kre.KRE_Match object; span=(0, 1), match='생'>
> kre.search(r";ㅇ", "생일 축하해~", boundaries=True)
<kre.KRE_Match object; span=(1, 2), match='일'>

As a more interesting, complicated, and perhaps useless example of what kre can do, the following swaps every sequential pair of final consonant(s) (받침) in the input string.

> sun_and_moon = "옛날 옛적 깊은 산 속에 가난하지만 사이좋은 오누이와 그 홀어머니 가족이 살고 있었다."
> kre.sub(r"([ㅏ-ㅣ])([ㄱ-ㅎ]{1,2};)(.*?)([ㅏ-ㅣ])([ㄱ-ㅎ]{1,2};)", r"\1\5\3\4\2", sun_and_moon, boundaries=True)
'옐낫 옉젓 긴읖 삭 손에 가난하지만 사이존읗 오누이와 그 혹어머니 가졸이 샀고 일었다.'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kre-0.9.9.tar.gz (66.6 kB view details)

Uploaded Source

Built Distribution

kre-0.9.9-py3-none-any.whl (31.1 kB view details)

Uploaded Python 3

File details

Details for the file kre-0.9.9.tar.gz.

File metadata

  • Download URL: kre-0.9.9.tar.gz
  • Upload date:
  • Size: 66.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for kre-0.9.9.tar.gz
Algorithm Hash digest
SHA256 f17d50328695bda6d0b56ef0fc6ac1cb58485a9b3bc7ea52e0bfe784d161bf67
MD5 03609ce7e35a1a39871230f9ce7bc7b2
BLAKE2b-256 a4611cf6ec2805e67b28fea9e17530d641e0adb3fd9f2ca117e0a6b484e22f15

See more details on using hashes here.

File details

Details for the file kre-0.9.9-py3-none-any.whl.

File metadata

  • Download URL: kre-0.9.9-py3-none-any.whl
  • Upload date:
  • Size: 31.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for kre-0.9.9-py3-none-any.whl
Algorithm Hash digest
SHA256 4974c72497e16220cc677ea22cb8ba2a245b3a2636b923a7af142eccf4912d3e
MD5 c31d20243562af8d430a4fa72b40fcee
BLAKE2b-256 ad658dc0c7010a67e03aa4145e5dc738465751d032644b16c0beb4851b89b885

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page