Skip to main content

Utilities for decomposing, composing, and keyboard-mapping Korean Hangul text.

Project description

JamoLib

CI PyPI Python GitHub release

JamoLib hero image

Fast Hangul text utilities for Python. JamoLib helps you decompose Hangul syllables into jamo, compose jamo back into syllables, and convert two-set Korean keyboard input into readable Korean text.

Why JamoLib

  • Linear-time text scanning for large Hangul strings.
  • Small API surface that is easy to drop into search, normalization, keyboard-input, and NLP preprocessing pipelines.
  • Preserves mixed text such as English, numbers, punctuation, and whitespace.
  • Handles common batchim boundary cases like 값이, 닭이, and 읽어.
  • Supports common compound-medial combinations such as ㄱㅗㅏ -> 과.

JamoLib example preview

Installation

pip install jamolib

Quick Start

import jamolib

text = "한글과 English 123"
decomposed = jamolib.decomposeHangulText(text)

print(decomposed)
# ㅎㅏㄴㄱㅡㄹㄱㅘ English 123

print(jamolib.composeHangulText("ㄱㅏㅂㅅㅇㅣ"))
# 값이

print(jamolib.translateEngToKor("dkssudgktpdy"))
# 안녕하세요

API At A Glance

Function Input Output Use case
decomposeHangul Single Hangul syllable Compatibility jamo string Token-level preprocessing
decomposeHangulText Mixed text Text with Hangul syllables decomposed Search normalization, phonetic indexing
composeHangul 초성 + 중성 [+ 종성] Single Hangul syllable Rebuilding syllables
composeHangulText Jamo text Re-composed Hangul text UI input handling, postprocessing
translateEngToKor Two-set English keyboard input Hangul text Keyboard typo correction
getCharset None Supported compatibility jamo list Validation and custom pipelines

Examples

Run an example from the repository root:

python examples/quickstart.py

Notes

  • decomposeHangul expects a single Hangul syllable.
  • composeHangul expects compatibility jamo in the order 초성 + 중성 [+ 종성].
  • composeHangulText also combines common compound medials like ㅗㅏ, ㅜㅓ, and ㅡㅣ.
  • translateEngToKor uses the standard two-set Korean keyboard mapping.
  • Mixed strings are preserved as-is outside Hangul processing.

Performance

The current implementation uses a single-pass scanner instead of repeated global string replacement. Local measurements on Python 3.12 in this repository produced the following averages:

Operation Input shape Average time
decomposeHangulText Repeated Hangul sentence x500 0.0041s
composeHangulText Recompose decomposed sentence x500 0.0205s
translateEngToKor Keyboard string x2000 0.0123s

These numbers are environment-dependent, but they reflect the optimized code currently in the repository.

Development

python -m pip install -e .[test]
pytest
python scripts/benchmark.py
python -m build

Project Docs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jamolib-0.2.0.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jamolib-0.2.0-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file jamolib-0.2.0.tar.gz.

File metadata

  • Download URL: jamolib-0.2.0.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for jamolib-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3edeb19600c9b68e073adeef1c2a88fd5ceb4d606bcc0fac9f78a71dbfed52b4
MD5 962172f61962921c0f36a2c45be86839
BLAKE2b-256 19004296fdc88d5b9fcde126d22406b85a07a27fc4c4a3f3dc0da64236dfd889

See more details on using hashes here.

Provenance

The following attestation bundles were made for jamolib-0.2.0.tar.gz:

Publisher: python-publish.yml on smturtle2/jamolib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file jamolib-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: jamolib-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for jamolib-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b4ad48bc105b05ecae622ed0dfa216cfe3e8d10130a4785cdddb6348a42dbe45
MD5 2429513c59b4bd7db840d1e19a84f800
BLAKE2b-256 8806e3ce288603d92d46f1dd003bede9ea4c4e16f0308911b4b9504d7f24ea1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for jamolib-0.2.0-py3-none-any.whl:

Publisher: python-publish.yml on smturtle2/jamolib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page