Skip to main content

Utilities for decomposing, composing, and keyboard-mapping Korean Hangul text.

Project description

JamoLib

CI PyPI Python GitHub release

JamoLib hero image

Fast Hangul text utilities for Python. JamoLib helps you decompose Hangul syllables into jamo, compose jamo back into syllables, and convert two-set Korean keyboard input into readable Korean text.

Why JamoLib

  • Linear-time text scanning for large Hangul strings.
  • Small API surface that is easy to drop into search, normalization, keyboard-input, and NLP preprocessing pipelines.
  • Preserves mixed text such as English, numbers, punctuation, and whitespace.
  • Handles common batchim boundary cases like 값이, 닭이, and 읽어.
  • Supports common compound-medial combinations such as ㄱㅗㅏ -> 과.

JamoLib example preview

Installation

pip install jamolib

Quick Start

import jamolib

text = "한글과 English 123"
decomposed = jamolib.decomposeHangulText(text)

print(decomposed)
# ㅎㅏㄴㄱㅡㄹㄱㅘ English 123

print(jamolib.composeHangulText("ㄱㅏㅂㅅㅇㅣ"))
# 값이

print(jamolib.translateEngToKor("dkssudgktpdy"))
# 안녕하세요

API At A Glance

Function Input Output Use case
decomposeHangul Single Hangul syllable Compatibility jamo string Token-level preprocessing
decomposeHangulText Mixed text Text with Hangul syllables decomposed Search normalization, phonetic indexing
composeHangul 초성 + 중성 [+ 종성] Single Hangul syllable Rebuilding syllables
composeHangulText Jamo text Re-composed Hangul text UI input handling, postprocessing
translateEngToKor Two-set English keyboard input Hangul text Keyboard typo correction
getCharset None Supported compatibility jamo list Validation and custom pipelines

Examples

Run an example from the repository root:

python examples/quickstart.py

Notes

  • decomposeHangul expects a single Hangul syllable.
  • composeHangul expects compatibility jamo in the order 초성 + 중성 [+ 종성].
  • composeHangulText also combines common compound medials like ㅗㅏ, ㅜㅓ, and ㅡㅣ.
  • translateEngToKor uses the standard two-set Korean keyboard mapping.
  • Mixed strings are preserved as-is outside Hangul processing.

Performance

The current implementation uses a single-pass scanner instead of repeated global string replacement. Local measurements on Python 3.12 in this repository produced the following averages:

Operation Input shape Average time
decomposeHangulText Repeated Hangul sentence x500 0.0041s
composeHangulText Recompose decomposed sentence x500 0.0205s
translateEngToKor Keyboard string x2000 0.0123s

These numbers are environment-dependent, but they reflect the optimized code currently in the repository.

Development

python -m pip install -e .[test]
pytest
python scripts/benchmark.py
python -m build

Project Docs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jamolib-0.2.1.tar.gz (137.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jamolib-0.2.1-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file jamolib-0.2.1.tar.gz.

File metadata

  • Download URL: jamolib-0.2.1.tar.gz
  • Upload date:
  • Size: 137.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for jamolib-0.2.1.tar.gz
Algorithm Hash digest
SHA256 4ab90c8e9b69fb8baf8f5cf01ebd6a1a3cdd84f5271691c10d6c7f358019934e
MD5 dd99dc4754956eefb6172714b13ab087
BLAKE2b-256 e5c0921d47da364e75faedc102987044df54a7859c8b11814623ae8433d204bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for jamolib-0.2.1.tar.gz:

Publisher: python-publish.yml on smturtle2/jamolib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file jamolib-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: jamolib-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for jamolib-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cdc247c408e36e0eef58792823c871eb38b646498ff820e3dbdf1c1cdeb1e266
MD5 70d68aeac687d383e54263e9cb6e0a40
BLAKE2b-256 7d2c9cb33672620f59275b2a409d5ac5532b28d5b74d3570d3aefa6ae01e91dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for jamolib-0.2.1-py3-none-any.whl:

Publisher: python-publish.yml on smturtle2/jamolib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page