Skip to main content

Universal phonetic Hangul transcription โ€” convert any language into readable Korean.

Project description

๐ŸŽผ Hunmin

์™ธ๊ตญ์–ด๋ฅผ ํ•œ๊ธ€๋กœ โ€” ์•„์ด๋„ ์ฝ์„ ์ˆ˜ ์žˆ๋Š” ๋ฐœ์Œ ์•…๋ณด Convert any language into a readable phonetic Hangul score.

from hunmin import transcribe

transcribe("student", "en")        # ์Šคํˆฌ๋˜ํŠธ
transcribe("ไธญๅ›ฝ",     "zh")        # ์ค‘๊ตฌ์–ด
transcribe("ๆฑไบฌ",     "ja")        # ํ† ์šฐ์ฟ„์šฐ
transcribe("familia", "es", level=3)   # ใ†„์•„๋ฐ€๋ฆฌ์•„  (์˜›ํ•œ๊ธ€ ใ†„ = /f/)
transcribe("ะœะพัะบะฒะฐ",  "ru", level=4)   # ใ…ใ…—ใ……ใ…‹ใ…ธใ…  (UHPS jamo)

14๊ฐœ ์–ธ์–ด. ์ˆœ์ˆ˜ ๋ฃฐ ๊ธฐ๋ฐ˜. ์˜์กด์„ฑ 0 (CJK๋Š” ์„ ํƒ์ ).


โœจ ์™œ Hunmin์ธ๊ฐ€?

  • ํ•œ๊ธ€์€ ์›๋ž˜ ๋ฐœ์Œ ์•…๋ณด์˜€์Šต๋‹ˆ๋‹ค. 1443๋…„ ์„ธ์ข…์ด ๊ทธ๋ ‡๊ฒŒ ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์ฝ์œผ๋ฉด ๊ทธ ์–ธ์–ด๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์–ด๋ฆฐ์ด๊ฐ€ ์ฝ์–ด๋„ ์™ธ๊ตญ์ธ์ด ์•Œ์•„๋“ฃ๋Š” ๋ฐœ์Œ.
  • ์˜›ํ•œ๊ธ€ ๋ถ€ํ™œ. ํ•œ๊ตญ์–ด๊ฐ€ ์žƒ์€ ์†Œ๋ฆฌ๋“ค โ€” ใ†„ (/f/), ใ…ธ (/v/), ใ…ฟ (/z/) โ€” ๋‹ค์‹œ ์‚ฌ์šฉ.
  • ํ•˜๋‚˜์˜ API, 14๊ฐœ ์–ธ์–ด. ๊ฐ™์€ ํ˜ธ์ถœ, ๊ฒฐ์ •์  ์ถœ๋ ฅ.
  • ๋ธ”๋ž™๋ฐ•์Šค ์—†์Œ. 100% ๋ฃฐ ๊ธฐ๋ฐ˜ (default). ML ๋ชจ๋ธ์€ ์—ฐ๊ตฌ์šฉ ์˜ต์…˜.

๐Ÿ“ฆ ์„ค์น˜

pip install hunmin              # 11๊ฐœ (Latin/Cyrillic ์–ธ์–ด)
pip install hunmin[cjk]         # + ์ผ๋ณธ์–ด / ์ค‘๊ตญ์–ด / ํ•œ๊ตญ์–ด
pip install hunmin[all]         # ๋ชจ๋‘ + ์›น ๋ฐ๋ชจ

๐Ÿš€ ๋น ๋ฅธ ์‹œ์ž‘

Python

from hunmin import transcribe

# ๊ธฐ๋ณธ โ€” ์•„์ด์šฉ ํ•œ๊ธ€
transcribe("student", "en")           # ์Šคํˆฌ๋˜ํŠธ
transcribe("Paris",   "fr")           # ํŒŒ๋ฆฌ
transcribe("ไธญๅ›ฝ",     "zh")           # ์ค‘๊ตฌ์–ด
transcribe("ใ“ใ‚“ใซใกใฏ",   "ja")           # ์ฝ˜๋‹ˆ์น˜ํ•˜

# Level 3 โ€” ์˜›ํ•œ๊ธ€ ์ •๋ฐ€ (ํ•œ๊ตญ์–ด์— ์—†๋Š” ์†Œ๋ฆฌ ํ‘œ๊ธฐ)
transcribe("vine",    "en", level=3)  # ใ…ธ์•„์ธ  (ใ…ธ = /v/)
transcribe("zoo",     "en", level=3)  # ใ…ฟ์šฐ    (ใ…ฟ = /z/)
transcribe("father",  "en", level=3)  # ใ†„์•„๋œ  (ใ†„ = /f/)

# Level 4 โ€” UHPS jamo ์‹œํ€€์Šค (ML / ์—ฐ๊ตฌ์šฉ)
transcribe("student", "en", level=4)  # ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ
transcribe("ไธญๅ›ฝ",     "zh", level=4)  # ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“

CLI

$ hunmin --text "student" --lang en
์ŠคํŠœ๋˜ํŠธ

$ hunmin --text "ไธญๅ›ฝ" --lang zh --level 4
ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“

$ hunmin --demo
lang  text                  L1 (์•„์ด์šฉ)     L3 (์˜›ํ•œ๊ธ€)     L4 (jamo)
=================================================================
en    student              ์ŠคํŠœ๋˜ํŠธ         ์ŠคํŠœ๋˜ํŠธ         ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ
en    father               ํŒŒ๋”            ใ†„์•„๋œ            ใ†„ใ…ใ„ทใ…“ใ„น
es    familia              ํŒŒ๋ฐ€๋ฆฌ์•„         ใ†„์•„๋ฐ€๋ฆฌ์•„        ใ†„ใ…ใ…ใ…ฃใ„นใ…ฃใ…
ru    ะœะพัะบะฒะฐ               ๋ชจ์Šคํฌ๋ฐ”         ๋ชจ์Šคํฌใ…ธ์•„        ใ…ใ…—ใ……ใ…‹ใ…ธใ…
zh    ไธญๅ›ฝ                  ์ค‘๊ตฌ์–ด           ์ค‘๊ตฌ์–ด            ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“
ja    ๆฑไบฌ                  ํ† ์šฐ์ฟ„์šฐ         ํ† ์šฐ์ฟ„์šฐ          ใ…Œใ…—ใ…œใ…‹ใ…›ใ…œ
ko    ๅคง้Ÿ“ๆฐ‘ๅœ‹              ๋Œ€ํ•œ๋ฏผ๊ตญ         ๋Œ€ํ•œ๋ฏผ๊ตญ          ใ„ทใ…ใ…Žใ…ใ„ดใ…ใ…ฃใ„ดใ„ฑใ…œใ„ฑ
...

๐ŸŒ ์ง€์› ์–ธ์–ด (14๊ฐœ)

์ฝ”๋“œ ์–ธ์–ด ๋ฐฉ์‹ ์ •ํ™•๋„
en ์˜์–ด CMU ์‚ฌ์ „ + ๋ฃฐ 100% (์‚ฌ์ „ ๋‚ด)
es ์ŠคํŽ˜์ธ์–ด ๊ธ€์ž ๋ฃฐ 99.9%
it ์ดํƒˆ๋ฆฌ์•„์–ด ๊ธ€์ž ๋ฃฐ 99.8%
de ๋…์ผ์–ด ๊ธ€์ž ๋ฃฐ 99.0%
ru ๋Ÿฌ์‹œ์•„์–ด (Cyrillic) ๊ธ€์ž ๋ฃฐ 100.0%
fr ํ”„๋ž‘์Šค์–ด ๊ธ€์ž ๋ฃฐ 99.8%
pt ํฌ๋ฅดํˆฌ๊ฐˆ์–ด ๊ธ€์ž ๋ฃฐ 99.8%
nl ๋„ค๋œ๋ž€๋“œ์–ด ๊ธ€์ž ๋ฃฐ 99.6%
pl ํด๋ž€๋“œ์–ด ๊ธ€์ž ๋ฃฐ 99.4%
tr ํ„ฐํ‚ค์–ด ๊ธ€์ž ๋ฃฐ 100.0%
id ์ธ๋„๋„ค์‹œ์•„์–ด ๊ธ€์ž ๋ฃฐ 99.4%
ja ์ผ๋ณธ์–ด pykakasi + ๋ฃฐ 100%
zh ์ค‘๊ตญ์–ด (๋ถ๊ฒฝ์–ด) pypinyin + ๋ฃฐ 100%
ko ํ•œ๊ตญ์–ด (ํ•œ๊ธ€+ํ•œ์ž) hanja + native 100%

๐ŸŽš๏ธ Level 1โ€“4

Level ์šฉ๋„ ์˜ˆ์‹œ: student
1 ์–ด๋ฆฐ์ด/์ผ๋ฐ˜ ํ•œ๊ธ€ ์ŠคํŠœ๋˜ํŠธ
2 ์ž์—ฐ์Šค๋Ÿฌ์šด ๋ฐœ์Œ (์—ฐ์Œ โ€” ํ–ฅํ›„) ์ŠคํŠœ๋˜ํŠธ
3 ์ •๋ฐ€ (์˜›ํ•œ๊ธ€ ใ†„ ใ…ธ ใ…ฟ ์‚ฌ์šฉ) (ํ•ด๋‹น ์Œ ์—†์Œ)
4 UHPS ์ž๋ชจ ์‹œํ€€์Šค (ML/์˜ค๋””์˜ค ์—ฐ๊ตฌ) ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ

์˜›ํ•œ๊ธ€ ์˜ˆ: father Level 1: ํŒŒ๋” vs Level 3: ใ†„์•„๋œ. ใ†„๊ฐ€ /f/์ž„์„ ๋ช…์‹œ โ€” /p/์™€ ๊ตฌ๋ณ„.


๐Ÿง  ์ž‘๋™ ์›๋ฆฌ

                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
input + lang โ†’ โ”‚       Hunmin ๋ผ์šฐํ„ฐ     โ”‚
                โ””โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜
                   โ†“                  โ†“
        Latin / Cyrillic       ํ‘œ์˜๋ฌธ์ž (CJK)
        (en, es, it, de,       (ja, zh, ko)
         ru, fr, pt, nl,
         pl, tr, id)
                   โ†“                  โ†“
        ์–ธ์–ด๋ณ„ ๋ฃฐ ๋ชจ๋“ˆ            ๊ฒฐ์ •์  ์‚ฌ์ „
        (๊ธ€์ž โ†’ ์Œ์†Œ              (pykakasi /
         โ†’ ํ•œ๊ธ€ / ์ž๋ชจ)            pypinyin /
                                  hanja)
                   โ†“                  โ†“
                   โ””โ”€โ”€โ”€โ”€ ์ถœ๋ ฅ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                  (ํ•œ๊ธ€ / ์ž๋ชจ / ๋ถ„๋ฆฌ)

์™œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ? ํ‘œ์˜๋ฌธ์ž(ํ•œ์ž, ๆผขๅญ—)๋Š” ๊ธ€์ž ์ž์ฒด์— ๋ฐœ์Œ ์ •๋ณด๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค โ€” ์‚ฌ์ „ lookup์ด ์ •๋‹ต. ํ‘œ์Œ๋ฌธ์ž(Latin, Cyrillic)๋Š” ๊ธ€์žโ†’๋ฐœ์Œ ๊ทœ์น™์ด ์žˆ์Šต๋‹ˆ๋‹ค โ€” ์•Œ๊ณ ๋ฆฌ์ฆ˜์  ๋ณ€ํ™˜ ๊ฐ€๋Šฅ.


๐Ÿ”ฌ ML / ์—ฐ๊ตฌ์šฉ

ํŒจํ„ด ํ•™์Šต์ด ํ•„์š”ํ•˜๋ฉด Level 4 (์ž๋ชจ ๋ชจ๋“œ) ์ถœ๋ ฅ์ด ML ํŒŒ์ดํ”„๋ผ์ธ์— ๋ฐ”๋กœ ๋“ค์–ด๊ฐ‘๋‹ˆ๋‹ค.

hunmin.transcribe("hello", "en", level=4)  # ใ…Žใ…”ใ„นใ…—ใ…œ

์ž‘์€ transformer (~1.4M ํŒŒ๋ผ๋ฏธํ„ฐ) ํ•œ ๊ฐœ๋กœ 326K (text, jamo) ํŽ˜์–ด ํ•™์Šตํ•˜๋ฉด ํ…Œ์ŠคํŠธ์…‹ 97% exact / 99% char ์ •ํ™•๋„ ๋„๋‹ฌ. (docs/RESEARCH.md).


๐Ÿ“œ UHPS โ€” Universal Hangul Phoneme Set (์ž๋ชจ 45๊ฐœ)

์ž์Œ (24) ๋ชจ์Œ (21)
ํ˜„๋Œ€ ใ„ฑ ใ„ฒ ใ„ด ใ„ท ใ„ธ ใ„น ใ… ใ…‚ ใ…ƒ ใ…… ใ…† ใ…‡ ใ…ˆ ใ…‰ ใ…Š ใ…‹ ใ…Œ ใ… ใ…Ž ใ… ใ… ใ…‘ ใ…’ ใ…“ ใ…” ใ…• ใ…– ใ…— ใ…˜ ใ…™ ใ…š ใ…› ใ…œ ใ… ใ…ž ใ…Ÿ ใ…  ใ…ก ใ…ข ใ…ฃ
์˜›ํ•œ๊ธ€ ใ†„ /f/ ยท ใ…ธ /v/ ยท ใ…ฟ /z/ ยท ใ† /ล‹/ ยท ใ†† /ส”/ โ€”

๊ฐ™์€ IPA โ†’ ๋ชจ๋“  ์–ธ์–ด์—์„œ ๊ฐ™์€ ์ž๋ชจ. ์ •ํ™•๋„๋ณด๋‹ค ์ผ๊ด€์„ฑ โ€” ML ์•ˆ์ •์„ฑ์„ ์œ„ํ•œ ์„ค๊ณ„.


๐Ÿ›๏ธ ๋น„์ „

์–ด๋ฆฐ์ด๋„ ๋ฉฐ์น  ์•ˆ์— ์ตํ˜€์„œ ๋ชจ๋“  ์†Œ๋ฆฌ๋ฅผ ์ ์„ ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋ผ. Even a child should learn it in days, and use it to write any sound. โ€” ่จ“ๆฐ‘ๆญฃ้Ÿณ ่งฃไพ‹ๆœฌ, 1446

์„ธ์ข…๋Œ€์™•์ด ์˜๋„ํ–ˆ๋˜ "๋ณดํŽธ์  ์Œ์„ฑ ํ‘œ๊ธฐ ์ฒด๊ณ„๋กœ์„œ์˜ ํ•œ๊ธ€" ๋ถ€ํ™œ.


๐Ÿ“ˆ ํ˜„์žฌ ์ƒํƒœ

  • v1.0 โ€” 14๊ฐœ ์–ธ์–ด, ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ํŒŒ์ดํ”„๋ผ์ธ, ํ…Œ์ŠคํŠธ ์ •ํ™•๋„ 98.4%, UHPS freeze.

๐Ÿ“ ๋ผ์ด์„ ์Šค

MIT.


๐Ÿ™ ์‚ฌ์šฉ ๋„๊ตฌ

  • pykakasi โ€” ์ผ๋ณธ์–ด ๊ฐ€๋‚˜ ๋ณ€ํ™˜
  • pypinyin โ€” ์ค‘๊ตญ์–ด ๋ณ‘์Œ
  • hanja โ€” ํ•œ๊ตญ์–ด ํ•œ์ž์Œ
  • CMU Pronouncing Dictionary โ€” ์˜์–ด G2P
  • hermitdave/FrequencyWords โ€” corpus seed (OpenSubtitles)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hunmin-1.1.0.tar.gz (971.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hunmin-1.1.0-py3-none-any.whl (993.3 kB view details)

Uploaded Python 3

File details

Details for the file hunmin-1.1.0.tar.gz.

File metadata

  • Download URL: hunmin-1.1.0.tar.gz
  • Upload date:
  • Size: 971.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for hunmin-1.1.0.tar.gz
Algorithm Hash digest
SHA256 ec76c54caa40c8b94aab0be01fe870037069c5ea32e46b7247f3a5af8fb5ce4d
MD5 fae0f7e96f7120d84b2856f980a11d58
BLAKE2b-256 5117516c10655aadab5c8ffacb9800e703e18bb8d8f6c82efdc89b535f082b91

See more details on using hashes here.

File details

Details for the file hunmin-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: hunmin-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 993.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for hunmin-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 45aa5497420f7323a9f4a0255777bcc200eecbe3dc568cb70243f7248ad48f0f
MD5 31676866ccf305d51293030c8edfee1e
BLAKE2b-256 f29e438564322bc04894bddbfc3c1fb09da646e8cae27522b3375a75ab22e703

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page