Utilities for decomposing, composing, and keyboard-mapping Korean Hangul text.
Project description
JamoLib
Fast Hangul text utilities for Python.
JamoLib helps you decompose Hangul syllables into jamo, compose jamo back into syllables, and convert two-set Korean keyboard input into readable Korean text.
Why JamoLib
- Linear-time text scanning for large Hangul strings.
- Small API surface that is easy to drop into search, normalization, keyboard-input, and NLP preprocessing pipelines.
- Preserves mixed text such as English, numbers, punctuation, and whitespace.
- Handles common batchim boundary cases like
값이,닭이, and읽어. - Supports common compound-medial combinations such as
ㄱㅗㅏ -> 과.
Installation
pip install jamolib
Quick Start
import jamolib
text = "한글과 English 123"
decomposed = jamolib.decomposeHangulText(text)
print(decomposed)
# ㅎㅏㄴㄱㅡㄹㄱㅘ English 123
print(jamolib.composeHangulText("ㄱㅏㅂㅅㅇㅣ"))
# 값이
print(jamolib.translateEngToKor("dkssudgktpdy"))
# 안녕하세요
API At A Glance
| Function | Input | Output | Use case |
|---|---|---|---|
decomposeHangul |
Single Hangul syllable | Compatibility jamo string | Token-level preprocessing |
decomposeHangulText |
Mixed text | Text with Hangul syllables decomposed | Search normalization, phonetic indexing |
composeHangul |
초성 + 중성 [+ 종성] |
Single Hangul syllable | Rebuilding syllables |
composeHangulText |
Jamo text | Re-composed Hangul text | UI input handling, postprocessing |
translateEngToKor |
Two-set English keyboard input | Hangul text | Keyboard typo correction |
getCharset |
None | Supported compatibility jamo list | Validation and custom pipelines |
Examples
examples/quickstart.py: decomposition, composition, and keyboard conversion in one scriptexamples/mixed_text.py: preserving non-Hangul text while processing Hangulexamples/batchim_boundaries.py: tricky batchim and syllable-boundary cases
Run an example from the repository root:
python examples/quickstart.py
Notes
decomposeHangulexpects a single Hangul syllable.composeHangulexpects compatibility jamo in the order초성 + 중성 [+ 종성].composeHangulTextalso combines common compound medials likeㅗㅏ,ㅜㅓ, andㅡㅣ.translateEngToKoruses the standard two-set Korean keyboard mapping.- Mixed strings are preserved as-is outside Hangul processing.
Performance
The current implementation uses a single-pass scanner instead of repeated global string replacement. Local measurements on Python 3.12 in this repository produced the following averages:
| Operation | Input shape | Average time |
|---|---|---|
decomposeHangulText |
Repeated Hangul sentence x500 | 0.0041s |
composeHangulText |
Recompose decomposed sentence x500 | 0.0205s |
translateEngToKor |
Keyboard string x2000 | 0.0123s |
These numbers are environment-dependent, but they reflect the optimized code currently in the repository.
Development
python -m pip install -e .[test]
pytest
python scripts/benchmark.py
python -m build
Project Docs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jamolib-0.2.1.tar.gz.
File metadata
- Download URL: jamolib-0.2.1.tar.gz
- Upload date:
- Size: 137.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ab90c8e9b69fb8baf8f5cf01ebd6a1a3cdd84f5271691c10d6c7f358019934e
|
|
| MD5 |
dd99dc4754956eefb6172714b13ab087
|
|
| BLAKE2b-256 |
e5c0921d47da364e75faedc102987044df54a7859c8b11814623ae8433d204bb
|
Provenance
The following attestation bundles were made for jamolib-0.2.1.tar.gz:
Publisher:
python-publish.yml on smturtle2/jamolib
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jamolib-0.2.1.tar.gz -
Subject digest:
4ab90c8e9b69fb8baf8f5cf01ebd6a1a3cdd84f5271691c10d6c7f358019934e - Sigstore transparency entry: 1122983007
- Sigstore integration time:
-
Permalink:
smturtle2/jamolib@137762420aec4e3b8546b28bab1c11f7444b39b4 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/smturtle2
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@137762420aec4e3b8546b28bab1c11f7444b39b4 -
Trigger Event:
release
-
Statement type:
File details
Details for the file jamolib-0.2.1-py3-none-any.whl.
File metadata
- Download URL: jamolib-0.2.1-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdc247c408e36e0eef58792823c871eb38b646498ff820e3dbdf1c1cdeb1e266
|
|
| MD5 |
70d68aeac687d383e54263e9cb6e0a40
|
|
| BLAKE2b-256 |
7d2c9cb33672620f59275b2a409d5ac5532b28d5b74d3570d3aefa6ae01e91dc
|
Provenance
The following attestation bundles were made for jamolib-0.2.1-py3-none-any.whl:
Publisher:
python-publish.yml on smturtle2/jamolib
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jamolib-0.2.1-py3-none-any.whl -
Subject digest:
cdc247c408e36e0eef58792823c871eb38b646498ff820e3dbdf1c1cdeb1e266 - Sigstore transparency entry: 1122983051
- Sigstore integration time:
-
Permalink:
smturtle2/jamolib@137762420aec4e3b8546b28bab1c11f7444b39b4 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/smturtle2
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@137762420aec4e3b8546b28bab1c11f7444b39b4 -
Trigger Event:
release
-
Statement type: