Skip to main content

Universal phonetic Hangul transcription โ€” convert any language into readable Korean.

Project description

๐ŸŽผ Hunmin

์™ธ๊ตญ์–ด๋ฅผ ํ•œ๊ธ€๋กœ โ€” ์•„์ด๋„ ์ฝ์„ ์ˆ˜ ์žˆ๋Š” ๋ฐœ์Œ ์•…๋ณด Convert any language into a readable phonetic Hangul score.

from hunmin import transcribe

transcribe("student", "en")        # ์Šคํˆฌ๋˜ํŠธ
transcribe("ไธญๅ›ฝ",     "zh")        # ์ค‘๊ตฌ์–ด
transcribe("ๆฑไบฌ",     "ja")        # ํ† ์šฐ์ฟ„์šฐ
transcribe("familia", "es", level=3)   # ใ†„์•„๋ฐ€๋ฆฌ์•„  (์˜›ํ•œ๊ธ€ ใ†„ = /f/)
transcribe("ะœะพัะบะฒะฐ",  "ru", level=4)   # ใ…ใ…—ใ……ใ…‹ใ…ธใ…  (UHPS jamo)

14๊ฐœ ์–ธ์–ด. ์ˆœ์ˆ˜ ๋ฃฐ ๊ธฐ๋ฐ˜. ์˜์กด์„ฑ 0 (CJK๋Š” ์„ ํƒ์ ).


โœจ ์™œ Hunmin์ธ๊ฐ€?

  • ํ•œ๊ธ€์€ ์›๋ž˜ ๋ฐœ์Œ ์•…๋ณด์˜€์Šต๋‹ˆ๋‹ค. 1443๋…„ ์„ธ์ข…์ด ๊ทธ๋ ‡๊ฒŒ ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์ฝ์œผ๋ฉด ๊ทธ ์–ธ์–ด๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์–ด๋ฆฐ์ด๊ฐ€ ์ฝ์–ด๋„ ์™ธ๊ตญ์ธ์ด ์•Œ์•„๋“ฃ๋Š” ๋ฐœ์Œ.
  • ์˜›ํ•œ๊ธ€ ๋ถ€ํ™œ. ํ•œ๊ตญ์–ด๊ฐ€ ์žƒ์€ ์†Œ๋ฆฌ๋“ค โ€” ใ†„ (/f/), ใ…ธ (/v/), ใ…ฟ (/z/) โ€” ๋‹ค์‹œ ์‚ฌ์šฉ.
  • ํ•˜๋‚˜์˜ API, 14๊ฐœ ์–ธ์–ด. ๊ฐ™์€ ํ˜ธ์ถœ, ๊ฒฐ์ •์  ์ถœ๋ ฅ.
  • ๋ธ”๋ž™๋ฐ•์Šค ์—†์Œ. 100% ๋ฃฐ ๊ธฐ๋ฐ˜ (default). ML ๋ชจ๋ธ์€ ์—ฐ๊ตฌ์šฉ ์˜ต์…˜.

๐Ÿ“ฆ ์„ค์น˜

pip install hunmin              # 11๊ฐœ (Latin/Cyrillic ์–ธ์–ด)
pip install hunmin[cjk]         # + ์ผ๋ณธ์–ด / ์ค‘๊ตญ์–ด / ํ•œ๊ตญ์–ด
pip install hunmin[all]         # ๋ชจ๋‘ + ์›น ๋ฐ๋ชจ

๐Ÿš€ ๋น ๋ฅธ ์‹œ์ž‘

Python

from hunmin import transcribe

# ๊ธฐ๋ณธ โ€” ์•„์ด์šฉ ํ•œ๊ธ€
transcribe("student", "en")           # ์Šคํˆฌ๋˜ํŠธ
transcribe("Paris",   "fr")           # ํŒŒ๋ฆฌ
transcribe("ไธญๅ›ฝ",     "zh")           # ์ค‘๊ตฌ์–ด
transcribe("ใ“ใ‚“ใซใกใฏ",   "ja")           # ์ฝ˜๋‹ˆ์น˜ํ•˜

# Level 3 โ€” ์˜›ํ•œ๊ธ€ ์ •๋ฐ€ (ํ•œ๊ตญ์–ด์— ์—†๋Š” ์†Œ๋ฆฌ ํ‘œ๊ธฐ)
transcribe("vine",    "en", level=3)  # ใ…ธ์•„์ธ  (ใ…ธ = /v/)
transcribe("zoo",     "en", level=3)  # ใ…ฟ์šฐ    (ใ…ฟ = /z/)
transcribe("father",  "en", level=3)  # ใ†„์•„๋œ  (ใ†„ = /f/)

# Level 4 โ€” UHPS jamo ์‹œํ€€์Šค (ML / ์—ฐ๊ตฌ์šฉ)
transcribe("student", "en", level=4)  # ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ
transcribe("ไธญๅ›ฝ",     "zh", level=4)  # ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“

CLI

$ hunmin --text "student" --lang en
์ŠคํŠœ๋˜ํŠธ

$ hunmin --text "ไธญๅ›ฝ" --lang zh --level 4
ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“

$ hunmin --demo
lang  text                  L1 (์•„์ด์šฉ)     L3 (์˜›ํ•œ๊ธ€)     L4 (jamo)
=================================================================
en    student              ์ŠคํŠœ๋˜ํŠธ         ์ŠคํŠœ๋˜ํŠธ         ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ
en    father               ํŒŒ๋”            ใ†„์•„๋œ            ใ†„ใ…ใ„ทใ…“ใ„น
es    familia              ํŒŒ๋ฐ€๋ฆฌ์•„         ใ†„์•„๋ฐ€๋ฆฌ์•„        ใ†„ใ…ใ…ใ…ฃใ„นใ…ฃใ…
ru    ะœะพัะบะฒะฐ               ๋ชจ์Šคํฌ๋ฐ”         ๋ชจ์Šคํฌใ…ธ์•„        ใ…ใ…—ใ……ใ…‹ใ…ธใ…
zh    ไธญๅ›ฝ                  ์ค‘๊ตฌ์–ด           ์ค‘๊ตฌ์–ด            ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“
ja    ๆฑไบฌ                  ํ† ์šฐ์ฟ„์šฐ         ํ† ์šฐ์ฟ„์šฐ          ใ…Œใ…—ใ…œใ…‹ใ…›ใ…œ
ko    ๅคง้Ÿ“ๆฐ‘ๅœ‹              ๋Œ€ํ•œ๋ฏผ๊ตญ         ๋Œ€ํ•œ๋ฏผ๊ตญ          ใ„ทใ…ใ…Žใ…ใ„ดใ…ใ…ฃใ„ดใ„ฑใ…œใ„ฑ
...

๐ŸŒ ์ง€์› ์–ธ์–ด (14๊ฐœ)

์ฝ”๋“œ ์–ธ์–ด ๋ฐฉ์‹ ์ •ํ™•๋„
en ์˜์–ด CMU ์‚ฌ์ „ + ๋ฃฐ 100% (์‚ฌ์ „ ๋‚ด)
es ์ŠคํŽ˜์ธ์–ด ๊ธ€์ž ๋ฃฐ 99.9%
it ์ดํƒˆ๋ฆฌ์•„์–ด ๊ธ€์ž ๋ฃฐ 99.8%
de ๋…์ผ์–ด ๊ธ€์ž ๋ฃฐ 99.0%
ru ๋Ÿฌ์‹œ์•„์–ด (Cyrillic) ๊ธ€์ž ๋ฃฐ 100.0%
fr ํ”„๋ž‘์Šค์–ด ๊ธ€์ž ๋ฃฐ 99.8%
pt ํฌ๋ฅดํˆฌ๊ฐˆ์–ด ๊ธ€์ž ๋ฃฐ 99.8%
nl ๋„ค๋œ๋ž€๋“œ์–ด ๊ธ€์ž ๋ฃฐ 99.6%
pl ํด๋ž€๋“œ์–ด ๊ธ€์ž ๋ฃฐ 99.4%
tr ํ„ฐํ‚ค์–ด ๊ธ€์ž ๋ฃฐ 100.0%
id ์ธ๋„๋„ค์‹œ์•„์–ด ๊ธ€์ž ๋ฃฐ 99.4%
ja ์ผ๋ณธ์–ด pykakasi + ๋ฃฐ 100%
zh ์ค‘๊ตญ์–ด (๋ถ๊ฒฝ์–ด) pypinyin + ๋ฃฐ 100%
ko ํ•œ๊ตญ์–ด (ํ•œ๊ธ€+ํ•œ์ž) hanja + native 100%

๐ŸŽš๏ธ Level 1โ€“4

Level ์šฉ๋„ ์˜ˆ์‹œ: student
1 ์–ด๋ฆฐ์ด/์ผ๋ฐ˜ ํ•œ๊ธ€ ์ŠคํŠœ๋˜ํŠธ
2 ์ž์—ฐ์Šค๋Ÿฌ์šด ๋ฐœ์Œ (์—ฐ์Œ โ€” ํ–ฅํ›„) ์ŠคํŠœ๋˜ํŠธ
3 ์ •๋ฐ€ (์˜›ํ•œ๊ธ€ ใ†„ ใ…ธ ใ…ฟ ์‚ฌ์šฉ) (ํ•ด๋‹น ์Œ ์—†์Œ)
4 UHPS ์ž๋ชจ ์‹œํ€€์Šค (ML/์˜ค๋””์˜ค ์—ฐ๊ตฌ) ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ

์˜›ํ•œ๊ธ€ ์˜ˆ: father Level 1: ํŒŒ๋” vs Level 3: ใ†„์•„๋œ. ใ†„๊ฐ€ /f/์ž„์„ ๋ช…์‹œ โ€” /p/์™€ ๊ตฌ๋ณ„.


๐Ÿง  ์ž‘๋™ ์›๋ฆฌ

                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
input + lang โ†’ โ”‚       Hunmin ๋ผ์šฐํ„ฐ     โ”‚
                โ””โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜
                   โ†“                  โ†“
        Latin / Cyrillic       ํ‘œ์˜๋ฌธ์ž (CJK)
        (en, es, it, de,       (ja, zh, ko)
         ru, fr, pt, nl,
         pl, tr, id)
                   โ†“                  โ†“
        ์–ธ์–ด๋ณ„ ๋ฃฐ ๋ชจ๋“ˆ            ๊ฒฐ์ •์  ์‚ฌ์ „
        (๊ธ€์ž โ†’ ์Œ์†Œ              (pykakasi /
         โ†’ ํ•œ๊ธ€ / ์ž๋ชจ)            pypinyin /
                                  hanja)
                   โ†“                  โ†“
                   โ””โ”€โ”€โ”€โ”€ ์ถœ๋ ฅ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                  (ํ•œ๊ธ€ / ์ž๋ชจ / ๋ถ„๋ฆฌ)

์™œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ? ํ‘œ์˜๋ฌธ์ž(ํ•œ์ž, ๆผขๅญ—)๋Š” ๊ธ€์ž ์ž์ฒด์— ๋ฐœ์Œ ์ •๋ณด๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค โ€” ์‚ฌ์ „ lookup์ด ์ •๋‹ต. ํ‘œ์Œ๋ฌธ์ž(Latin, Cyrillic)๋Š” ๊ธ€์žโ†’๋ฐœ์Œ ๊ทœ์น™์ด ์žˆ์Šต๋‹ˆ๋‹ค โ€” ์•Œ๊ณ ๋ฆฌ์ฆ˜์  ๋ณ€ํ™˜ ๊ฐ€๋Šฅ.


๐Ÿ”ฌ ML / ์—ฐ๊ตฌ์šฉ

ํŒจํ„ด ํ•™์Šต์ด ํ•„์š”ํ•˜๋ฉด Level 4 (์ž๋ชจ ๋ชจ๋“œ) ์ถœ๋ ฅ์ด ML ํŒŒ์ดํ”„๋ผ์ธ์— ๋ฐ”๋กœ ๋“ค์–ด๊ฐ‘๋‹ˆ๋‹ค.

hunmin.transcribe("hello", "en", level=4)  # ใ…Žใ…”ใ„นใ…—ใ…œ

์ž‘์€ transformer (~1.4M ํŒŒ๋ผ๋ฏธํ„ฐ) ํ•œ ๊ฐœ๋กœ 326K (text, jamo) ํŽ˜์–ด ํ•™์Šตํ•˜๋ฉด ํ…Œ์ŠคํŠธ์…‹ 97% exact / 99% char ์ •ํ™•๋„ ๋„๋‹ฌ. (docs/RESEARCH.md).


๐Ÿ“œ UHPS โ€” Universal Hangul Phoneme Set (์ž๋ชจ 45๊ฐœ)

์ž์Œ (24) ๋ชจ์Œ (21)
ํ˜„๋Œ€ ใ„ฑ ใ„ฒ ใ„ด ใ„ท ใ„ธ ใ„น ใ… ใ…‚ ใ…ƒ ใ…… ใ…† ใ…‡ ใ…ˆ ใ…‰ ใ…Š ใ…‹ ใ…Œ ใ… ใ…Ž ใ… ใ… ใ…‘ ใ…’ ใ…“ ใ…” ใ…• ใ…– ใ…— ใ…˜ ใ…™ ใ…š ใ…› ใ…œ ใ… ใ…ž ใ…Ÿ ใ…  ใ…ก ใ…ข ใ…ฃ
์˜›ํ•œ๊ธ€ ใ†„ /f/ ยท ใ…ธ /v/ ยท ใ…ฟ /z/ ยท ใ† /ล‹/ ยท ใ†† /ส”/ โ€”

๊ฐ™์€ IPA โ†’ ๋ชจ๋“  ์–ธ์–ด์—์„œ ๊ฐ™์€ ์ž๋ชจ. ์ •ํ™•๋„๋ณด๋‹ค ์ผ๊ด€์„ฑ โ€” ML ์•ˆ์ •์„ฑ์„ ์œ„ํ•œ ์„ค๊ณ„.


๐Ÿ›๏ธ ๋น„์ „

์–ด๋ฆฐ์ด๋„ ๋ฉฐ์น  ์•ˆ์— ์ตํ˜€์„œ ๋ชจ๋“  ์†Œ๋ฆฌ๋ฅผ ์ ์„ ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋ผ. Even a child should learn it in days, and use it to write any sound. โ€” ่จ“ๆฐ‘ๆญฃ้Ÿณ ่งฃไพ‹ๆœฌ, 1446

์„ธ์ข…๋Œ€์™•์ด ์˜๋„ํ–ˆ๋˜ "๋ณดํŽธ์  ์Œ์„ฑ ํ‘œ๊ธฐ ์ฒด๊ณ„๋กœ์„œ์˜ ํ•œ๊ธ€" ๋ถ€ํ™œ.


๐Ÿ“ˆ ํ˜„์žฌ ์ƒํƒœ

  • v1.0 โ€” 14๊ฐœ ์–ธ์–ด, ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ํŒŒ์ดํ”„๋ผ์ธ, ํ…Œ์ŠคํŠธ ์ •ํ™•๋„ 98.4%, UHPS freeze.

๐Ÿ“ ๋ผ์ด์„ ์Šค

MIT.


๐Ÿ™ ์‚ฌ์šฉ ๋„๊ตฌ

  • pykakasi โ€” ์ผ๋ณธ์–ด ๊ฐ€๋‚˜ ๋ณ€ํ™˜
  • pypinyin โ€” ์ค‘๊ตญ์–ด ๋ณ‘์Œ
  • hanja โ€” ํ•œ๊ตญ์–ด ํ•œ์ž์Œ
  • CMU Pronouncing Dictionary โ€” ์˜์–ด G2P
  • hermitdave/FrequencyWords โ€” corpus seed (OpenSubtitles)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hunmin-1.1.1.tar.gz (972.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hunmin-1.1.1-py3-none-any.whl (994.6 kB view details)

Uploaded Python 3

File details

Details for the file hunmin-1.1.1.tar.gz.

File metadata

  • Download URL: hunmin-1.1.1.tar.gz
  • Upload date:
  • Size: 972.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for hunmin-1.1.1.tar.gz
Algorithm Hash digest
SHA256 f8bb983511dd9cbcc616cdf182943750893cd4a42039b827eecd78edb689c123
MD5 7de079417349830af0310ae5346e5041
BLAKE2b-256 ead70bba0f91f55b2677c7caff428c0b656a04d9ffbb9b4723dc5fb5ab95b335

See more details on using hashes here.

File details

Details for the file hunmin-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: hunmin-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 994.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for hunmin-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0634eda24b1389d1aba202b22c8131e96624c5ab0ea74b90237547cf3826b915
MD5 accd3e32c6c2eb5c239d9a3ad3971eda
BLAKE2b-256 fd03cb332f2556be108586a845af0f04aa19be95f02f34d8d05846427652fb28

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page