Skip to main content

Universal phonetic Hangul transcription โ€” convert any language into readable Korean.

Project description

๐ŸŽผ Hunmin

์™ธ๊ตญ์–ด๋ฅผ ํ•œ๊ธ€๋กœ โ€” ์•„์ด๋„ ์ฝ์„ ์ˆ˜ ์žˆ๋Š” ๋ฐœ์Œ ์•…๋ณด Convert any language into a readable phonetic Hangul score.

from hunmin import transcribe

transcribe("student", "en")        # ์Šคํˆฌ๋˜ํŠธ
transcribe("ไธญๅ›ฝ",     "zh")        # ์ค‘๊ตฌ์–ด
transcribe("ๆฑไบฌ",     "ja")        # ํ† ์šฐ์ฟ„์šฐ
transcribe("familia", "es", level=3)   # ใ†„์•„๋ฐ€๋ฆฌ์•„  (์˜›ํ•œ๊ธ€ ใ†„ = /f/)
transcribe("ะœะพัะบะฒะฐ",  "ru", level=4)   # ใ…ใ…—ใ……ใ…‹ใ…ธใ…  (UHPS jamo)

14๊ฐœ ์–ธ์–ด. ์ˆœ์ˆ˜ ๋ฃฐ ๊ธฐ๋ฐ˜. ์˜์กด์„ฑ 0 (CJK๋Š” ์„ ํƒ์ ).


โœจ ์™œ Hunmin์ธ๊ฐ€?

  • ํ•œ๊ธ€์€ ์›๋ž˜ ๋ฐœ์Œ ์•…๋ณด์˜€์Šต๋‹ˆ๋‹ค. 1443๋…„ ์„ธ์ข…์ด ๊ทธ๋ ‡๊ฒŒ ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์ฝ์œผ๋ฉด ๊ทธ ์–ธ์–ด๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์–ด๋ฆฐ์ด๊ฐ€ ์ฝ์–ด๋„ ์™ธ๊ตญ์ธ์ด ์•Œ์•„๋“ฃ๋Š” ๋ฐœ์Œ.
  • ์˜›ํ•œ๊ธ€ ๋ถ€ํ™œ. ํ•œ๊ตญ์–ด๊ฐ€ ์žƒ์€ ์†Œ๋ฆฌ๋“ค โ€” ใ†„ (/f/), ใ…ธ (/v/), ใ…ฟ (/z/) โ€” ๋‹ค์‹œ ์‚ฌ์šฉ.
  • ํ•˜๋‚˜์˜ API, 14๊ฐœ ์–ธ์–ด. ๊ฐ™์€ ํ˜ธ์ถœ, ๊ฒฐ์ •์  ์ถœ๋ ฅ.
  • ๋ธ”๋ž™๋ฐ•์Šค ์—†์Œ. 100% ๋ฃฐ ๊ธฐ๋ฐ˜ (default). ML ๋ชจ๋ธ์€ ์—ฐ๊ตฌ์šฉ ์˜ต์…˜.

๐Ÿ“ฆ ์„ค์น˜

pip install hunmin              # 11๊ฐœ (Latin/Cyrillic ์–ธ์–ด)
pip install hunmin[cjk]         # + ์ผ๋ณธ์–ด / ์ค‘๊ตญ์–ด / ํ•œ๊ตญ์–ด
pip install hunmin[all]         # ๋ชจ๋‘ + ์›น ๋ฐ๋ชจ

๐Ÿš€ ๋น ๋ฅธ ์‹œ์ž‘

Python

from hunmin import transcribe

# ๊ธฐ๋ณธ โ€” ์•„์ด์šฉ ํ•œ๊ธ€
transcribe("student", "en")           # ์Šคํˆฌ๋˜ํŠธ
transcribe("Paris",   "fr")           # ํŒŒ๋ฆฌ
transcribe("ไธญๅ›ฝ",     "zh")           # ์ค‘๊ตฌ์–ด
transcribe("ใ“ใ‚“ใซใกใฏ",   "ja")           # ์ฝ˜๋‹ˆ์น˜ํ•˜

# Level 3 โ€” ์˜›ํ•œ๊ธ€ ์ •๋ฐ€ (ํ•œ๊ตญ์–ด์— ์—†๋Š” ์†Œ๋ฆฌ ํ‘œ๊ธฐ)
transcribe("vine",    "en", level=3)  # ใ…ธ์•„์ธ  (ใ…ธ = /v/)
transcribe("zoo",     "en", level=3)  # ใ…ฟ์šฐ    (ใ…ฟ = /z/)
transcribe("father",  "en", level=3)  # ใ†„์•„๋œ  (ใ†„ = /f/)

# Level 4 โ€” UHPS jamo ์‹œํ€€์Šค (ML / ์—ฐ๊ตฌ์šฉ)
transcribe("student", "en", level=4)  # ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ
transcribe("ไธญๅ›ฝ",     "zh", level=4)  # ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“

CLI

$ hunmin --text "student" --lang en
์ŠคํŠœ๋˜ํŠธ

$ hunmin --text "ไธญๅ›ฝ" --lang zh --level 4
ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“

$ hunmin --demo
lang  text                  L1 (์•„์ด์šฉ)     L3 (์˜›ํ•œ๊ธ€)     L4 (jamo)
=================================================================
en    student              ์ŠคํŠœ๋˜ํŠธ         ์ŠคํŠœ๋˜ํŠธ         ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ
en    father               ํŒŒ๋”            ใ†„์•„๋œ            ใ†„ใ…ใ„ทใ…“ใ„น
es    familia              ํŒŒ๋ฐ€๋ฆฌ์•„         ใ†„์•„๋ฐ€๋ฆฌ์•„        ใ†„ใ…ใ…ใ…ฃใ„นใ…ฃใ…
ru    ะœะพัะบะฒะฐ               ๋ชจ์Šคํฌ๋ฐ”         ๋ชจ์Šคํฌใ…ธ์•„        ใ…ใ…—ใ……ใ…‹ใ…ธใ…
zh    ไธญๅ›ฝ                  ์ค‘๊ตฌ์–ด           ์ค‘๊ตฌ์–ด            ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“
ja    ๆฑไบฌ                  ํ† ์šฐ์ฟ„์šฐ         ํ† ์šฐ์ฟ„์šฐ          ใ…Œใ…—ใ…œใ…‹ใ…›ใ…œ
ko    ๅคง้Ÿ“ๆฐ‘ๅœ‹              ๋Œ€ํ•œ๋ฏผ๊ตญ         ๋Œ€ํ•œ๋ฏผ๊ตญ          ใ„ทใ…ใ…Žใ…ใ„ดใ…ใ…ฃใ„ดใ„ฑใ…œใ„ฑ
...

๐ŸŒ ์ง€์› ์–ธ์–ด (14๊ฐœ)

์ฝ”๋“œ ์–ธ์–ด ๋ฐฉ์‹ ์ •ํ™•๋„
en ์˜์–ด CMU ์‚ฌ์ „ + ๋ฃฐ 100% (์‚ฌ์ „ ๋‚ด)
es ์ŠคํŽ˜์ธ์–ด ๊ธ€์ž ๋ฃฐ 99.9%
it ์ดํƒˆ๋ฆฌ์•„์–ด ๊ธ€์ž ๋ฃฐ 99.8%
de ๋…์ผ์–ด ๊ธ€์ž ๋ฃฐ 99.0%
ru ๋Ÿฌ์‹œ์•„์–ด (Cyrillic) ๊ธ€์ž ๋ฃฐ 100.0%
fr ํ”„๋ž‘์Šค์–ด ๊ธ€์ž ๋ฃฐ 99.8%
pt ํฌ๋ฅดํˆฌ๊ฐˆ์–ด ๊ธ€์ž ๋ฃฐ 99.8%
nl ๋„ค๋œ๋ž€๋“œ์–ด ๊ธ€์ž ๋ฃฐ 99.6%
pl ํด๋ž€๋“œ์–ด ๊ธ€์ž ๋ฃฐ 99.4%
tr ํ„ฐํ‚ค์–ด ๊ธ€์ž ๋ฃฐ 100.0%
id ์ธ๋„๋„ค์‹œ์•„์–ด ๊ธ€์ž ๋ฃฐ 99.4%
ja ์ผ๋ณธ์–ด pykakasi + ๋ฃฐ 100%
zh ์ค‘๊ตญ์–ด (๋ถ๊ฒฝ์–ด) pypinyin + ๋ฃฐ 100%
ko ํ•œ๊ตญ์–ด (ํ•œ๊ธ€+ํ•œ์ž) hanja + native 100%

๐ŸŽš๏ธ Level 1โ€“4

Level ์šฉ๋„ ์˜ˆ์‹œ: student
1 ์–ด๋ฆฐ์ด/์ผ๋ฐ˜ ํ•œ๊ธ€ ์ŠคํŠœ๋˜ํŠธ
2 ์ž์—ฐ์Šค๋Ÿฌ์šด ๋ฐœ์Œ (์—ฐ์Œ โ€” ํ–ฅํ›„) ์ŠคํŠœ๋˜ํŠธ
3 ์ •๋ฐ€ (์˜›ํ•œ๊ธ€ ใ†„ ใ…ธ ใ…ฟ ์‚ฌ์šฉ) (ํ•ด๋‹น ์Œ ์—†์Œ)
4 UHPS ์ž๋ชจ ์‹œํ€€์Šค (ML/์˜ค๋””์˜ค ์—ฐ๊ตฌ) ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ

์˜›ํ•œ๊ธ€ ์˜ˆ: father Level 1: ํŒŒ๋” vs Level 3: ใ†„์•„๋œ. ใ†„๊ฐ€ /f/์ž„์„ ๋ช…์‹œ โ€” /p/์™€ ๊ตฌ๋ณ„.


๐Ÿง  ์ž‘๋™ ์›๋ฆฌ

                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
input + lang โ†’ โ”‚       Hunmin ๋ผ์šฐํ„ฐ     โ”‚
                โ””โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜
                   โ†“                  โ†“
        Latin / Cyrillic       ํ‘œ์˜๋ฌธ์ž (CJK)
        (en, es, it, de,       (ja, zh, ko)
         ru, fr, pt, nl,
         pl, tr, id)
                   โ†“                  โ†“
        ์–ธ์–ด๋ณ„ ๋ฃฐ ๋ชจ๋“ˆ            ๊ฒฐ์ •์  ์‚ฌ์ „
        (๊ธ€์ž โ†’ ์Œ์†Œ              (pykakasi /
         โ†’ ํ•œ๊ธ€ / ์ž๋ชจ)            pypinyin /
                                  hanja)
                   โ†“                  โ†“
                   โ””โ”€โ”€โ”€โ”€ ์ถœ๋ ฅ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                  (ํ•œ๊ธ€ / ์ž๋ชจ / ๋ถ„๋ฆฌ)

์™œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ? ํ‘œ์˜๋ฌธ์ž(ํ•œ์ž, ๆผขๅญ—)๋Š” ๊ธ€์ž ์ž์ฒด์— ๋ฐœ์Œ ์ •๋ณด๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค โ€” ์‚ฌ์ „ lookup์ด ์ •๋‹ต. ํ‘œ์Œ๋ฌธ์ž(Latin, Cyrillic)๋Š” ๊ธ€์žโ†’๋ฐœ์Œ ๊ทœ์น™์ด ์žˆ์Šต๋‹ˆ๋‹ค โ€” ์•Œ๊ณ ๋ฆฌ์ฆ˜์  ๋ณ€ํ™˜ ๊ฐ€๋Šฅ.


๐Ÿ”ฌ ML / ์—ฐ๊ตฌ์šฉ

ํŒจํ„ด ํ•™์Šต์ด ํ•„์š”ํ•˜๋ฉด Level 4 (์ž๋ชจ ๋ชจ๋“œ) ์ถœ๋ ฅ์ด ML ํŒŒ์ดํ”„๋ผ์ธ์— ๋ฐ”๋กœ ๋“ค์–ด๊ฐ‘๋‹ˆ๋‹ค.

hunmin.transcribe("hello", "en", level=4)  # ใ…Žใ…”ใ„นใ…—ใ…œ

์ž‘์€ transformer (~1.4M ํŒŒ๋ผ๋ฏธํ„ฐ) ํ•œ ๊ฐœ๋กœ 326K (text, jamo) ํŽ˜์–ด ํ•™์Šตํ•˜๋ฉด ํ…Œ์ŠคํŠธ์…‹ 97% exact / 99% char ์ •ํ™•๋„ ๋„๋‹ฌ. (docs/RESEARCH.md).


๐Ÿ“œ UHPS โ€” Universal Hangul Phoneme Set (์ž๋ชจ 45๊ฐœ)

์ž์Œ (24) ๋ชจ์Œ (21)
ํ˜„๋Œ€ ใ„ฑ ใ„ฒ ใ„ด ใ„ท ใ„ธ ใ„น ใ… ใ…‚ ใ…ƒ ใ…… ใ…† ใ…‡ ใ…ˆ ใ…‰ ใ…Š ใ…‹ ใ…Œ ใ… ใ…Ž ใ… ใ… ใ…‘ ใ…’ ใ…“ ใ…” ใ…• ใ…– ใ…— ใ…˜ ใ…™ ใ…š ใ…› ใ…œ ใ… ใ…ž ใ…Ÿ ใ…  ใ…ก ใ…ข ใ…ฃ
์˜›ํ•œ๊ธ€ ใ†„ /f/ ยท ใ…ธ /v/ ยท ใ…ฟ /z/ ยท ใ† /ล‹/ ยท ใ†† /ส”/ โ€”

๊ฐ™์€ IPA โ†’ ๋ชจ๋“  ์–ธ์–ด์—์„œ ๊ฐ™์€ ์ž๋ชจ. ์ •ํ™•๋„๋ณด๋‹ค ์ผ๊ด€์„ฑ โ€” ML ์•ˆ์ •์„ฑ์„ ์œ„ํ•œ ์„ค๊ณ„.


๐Ÿ›๏ธ ๋น„์ „

์–ด๋ฆฐ์ด๋„ ๋ฉฐ์น  ์•ˆ์— ์ตํ˜€์„œ ๋ชจ๋“  ์†Œ๋ฆฌ๋ฅผ ์ ์„ ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋ผ. Even a child should learn it in days, and use it to write any sound. โ€” ่จ“ๆฐ‘ๆญฃ้Ÿณ ่งฃไพ‹ๆœฌ, 1446

์„ธ์ข…๋Œ€์™•์ด ์˜๋„ํ–ˆ๋˜ "๋ณดํŽธ์  ์Œ์„ฑ ํ‘œ๊ธฐ ์ฒด๊ณ„๋กœ์„œ์˜ ํ•œ๊ธ€" ๋ถ€ํ™œ.


๐Ÿ“ˆ ํ˜„์žฌ ์ƒํƒœ

  • v1.0 โ€” 14๊ฐœ ์–ธ์–ด, ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ํŒŒ์ดํ”„๋ผ์ธ, ํ…Œ์ŠคํŠธ ์ •ํ™•๋„ 98.4%, UHPS freeze.

๐Ÿ“ ๋ผ์ด์„ ์Šค

MIT.


๐Ÿ™ ์‚ฌ์šฉ ๋„๊ตฌ

  • pykakasi โ€” ์ผ๋ณธ์–ด ๊ฐ€๋‚˜ ๋ณ€ํ™˜
  • pypinyin โ€” ์ค‘๊ตญ์–ด ๋ณ‘์Œ
  • hanja โ€” ํ•œ๊ตญ์–ด ํ•œ์ž์Œ
  • CMU Pronouncing Dictionary โ€” ์˜์–ด G2P
  • hermitdave/FrequencyWords โ€” corpus seed (OpenSubtitles)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hunmin-1.3.0.tar.gz (978.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hunmin-1.3.0-py3-none-any.whl (999.2 kB view details)

Uploaded Python 3

File details

Details for the file hunmin-1.3.0.tar.gz.

File metadata

  • Download URL: hunmin-1.3.0.tar.gz
  • Upload date:
  • Size: 978.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for hunmin-1.3.0.tar.gz
Algorithm Hash digest
SHA256 5616845347c194c0bb8fc2669e077bbd42bc03797881b5e8b645b276f95e84aa
MD5 c3dd7b57f83a7cfb4271bbc4173872fd
BLAKE2b-256 d7ac013f09a9526ae601376a5d7dc2885d3a88239812fe2c9f578a524bb0f484

See more details on using hashes here.

File details

Details for the file hunmin-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: hunmin-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 999.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for hunmin-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1c6e1f84fa97a7112830bbea16e0fd4f8c10a70fba1bec0c268b8c526833bc6c
MD5 b411e5fb45e12e88bac4f53ff62bd937
BLAKE2b-256 f8860380acf49d643c816d25fb2eda6e0a450fde346257c258418a1c5ac991bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page