Subcharacter-level regular expression functionality for Korean
Project description
kre
Subcharacter-level regular expressions with Korean text.
kre is a wrapper for re
from the Python Standard Library which allows users to apply the full functionality of re
at the subcharacter level for Korean text.
Installation
kre releases are available on PyPI.
pip install kre
Documentation
Most functionality is documented in the re documentation.
Documentation on the unique features of kre is available in the wiki, where you will also find discussion of inherent differences between re
(character-level regular expressions) and kre
(subcharacter-level regular expressions) and how kre addresses them. It is strongly recommended that users familiarize themselves with these differences.
Example Features
In the simple case of search functions, matches are mapped back to their original position.
> re.search(r"ㅡ", "한글") # no match
> kre.search(r"ㅡ", "한글")
<kre.KRE_Match object; span=(1, 2), match='글'>
In the case of subcharacter-level substitutions, kre can recombine any newly created sequences into standard Korean characters, provided the input used standard (syllable) characters.
> kre.sub(r"ㅏ", r"ㅗ", "핳ㅏ하ㅎㅏ하핳")
'홓ㅗ호ㅎㅗ호홓'
If you prefer, kre can also attempt to merge non-standard input with substitutions.
> kre.sub(r"ㅏ", r"ㅗ", "핳ㅏ하ㅎㅏ하핳", syllabify="extended")
'호호호호호홓'
Although linearizing a Korean string normally results in the loss of information about syllable boundaries, kre makes it possible to make use of syllable boundaries in regular expression patterns through the use of (customizable) syllable delimiters (';' by default).
> kre.search(r"ㅇ", "생일 축하해~")
<kre.KRE_Match object; span=(0, 1), match='생'>
> kre.search(r";ㅇ", "생일 축하해~", boundaries=True)
<kre.KRE_Match object; span=(1, 2), match='일'>
As a more interesting, complicated, and perhaps useless example of what kre can do, the following swaps every sequential pair of final consonant(s) (받침) in the input string.
> sun_and_moon = "옛날 옛적 깊은 산 속에 가난하지만 사이좋은 오누이와 그 홀어머니 가족이 살고 있었다."
> kre.sub(r"([ㅏ-ㅣ])([ㄱ-ㅎ]{1,2};)(.*?)([ㅏ-ㅣ])([ㄱ-ㅎ]{1,2};)", r"\1\5\3\4\2", sun_and_moon, boundaries=True)
'옐낫 옉젓 긴읖 삭 손에 가난하지만 사이존읗 오누이와 그 혹어머니 가졸이 샀고 일었다.'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file kre-0.9.9.tar.gz
.
File metadata
- Download URL: kre-0.9.9.tar.gz
- Upload date:
- Size: 66.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f17d50328695bda6d0b56ef0fc6ac1cb58485a9b3bc7ea52e0bfe784d161bf67 |
|
MD5 | 03609ce7e35a1a39871230f9ce7bc7b2 |
|
BLAKE2b-256 | a4611cf6ec2805e67b28fea9e17530d641e0adb3fd9f2ca117e0a6b484e22f15 |
File details
Details for the file kre-0.9.9-py3-none-any.whl
.
File metadata
- Download URL: kre-0.9.9-py3-none-any.whl
- Upload date:
- Size: 31.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4974c72497e16220cc677ea22cb8ba2a245b3a2636b923a7af142eccf4912d3e |
|
MD5 | c31d20243562af8d430a4fa72b40fcee |
|
BLAKE2b-256 | ad658dc0c7010a67e03aa4145e5dc738465751d032644b16c0beb4851b89b885 |