Skip to main content

Universal phonetic Hangul transcription โ€” convert any language into readable Korean.

Project description

๐ŸŽผ Hunmin

์™ธ๊ตญ์–ด๋ฅผ ํ•œ๊ธ€๋กœ โ€” ์•„์ด๋„ ์ฝ์„ ์ˆ˜ ์žˆ๋Š” ๋ฐœ์Œ ์•…๋ณด Convert any language into a readable phonetic Hangul score.

from hunmin import transcribe

transcribe("student", "en")        # ์Šคํˆฌ๋˜ํŠธ
transcribe("ไธญๅ›ฝ",     "zh")        # ์ค‘๊ตฌ์–ด
transcribe("ๆฑไบฌ",     "ja")        # ํ† ์šฐ์ฟ„์šฐ
transcribe("familia", "es", level=3)   # ใ†„์•„๋ฐ€๋ฆฌ์•„  (์˜›ํ•œ๊ธ€ ใ†„ = /f/)
transcribe("ะœะพัะบะฒะฐ",  "ru", level=4)   # ใ…ใ…—ใ……ใ…‹ใ…ธใ…  (UHPS jamo)

14๊ฐœ ์–ธ์–ด. ์ˆœ์ˆ˜ ๋ฃฐ ๊ธฐ๋ฐ˜. ์˜์กด์„ฑ 0 (CJK๋Š” ์„ ํƒ์ ).


โœจ ์™œ Hunmin์ธ๊ฐ€?

  • ํ•œ๊ธ€์€ ์›๋ž˜ ๋ฐœ์Œ ์•…๋ณด์˜€์Šต๋‹ˆ๋‹ค. 1443๋…„ ์„ธ์ข…์ด ๊ทธ๋ ‡๊ฒŒ ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์ฝ์œผ๋ฉด ๊ทธ ์–ธ์–ด๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์–ด๋ฆฐ์ด๊ฐ€ ์ฝ์–ด๋„ ์™ธ๊ตญ์ธ์ด ์•Œ์•„๋“ฃ๋Š” ๋ฐœ์Œ.
  • ์˜›ํ•œ๊ธ€ ๋ถ€ํ™œ. ํ•œ๊ตญ์–ด๊ฐ€ ์žƒ์€ ์†Œ๋ฆฌ๋“ค โ€” ใ†„ (/f/), ใ…ธ (/v/), ใ…ฟ (/z/) โ€” ๋‹ค์‹œ ์‚ฌ์šฉ.
  • ํ•˜๋‚˜์˜ API, 14๊ฐœ ์–ธ์–ด. ๊ฐ™์€ ํ˜ธ์ถœ, ๊ฒฐ์ •์  ์ถœ๋ ฅ.
  • ๋ธ”๋ž™๋ฐ•์Šค ์—†์Œ. 100% ๋ฃฐ ๊ธฐ๋ฐ˜ (default). ML ๋ชจ๋ธ์€ ์—ฐ๊ตฌ์šฉ ์˜ต์…˜.

๐Ÿ“ฆ ์„ค์น˜

pip install hunmin              # 11๊ฐœ (Latin/Cyrillic ์–ธ์–ด)
pip install hunmin[cjk]         # + ์ผ๋ณธ์–ด / ์ค‘๊ตญ์–ด / ํ•œ๊ตญ์–ด
pip install hunmin[all]         # ๋ชจ๋‘ + ์›น ๋ฐ๋ชจ

๐Ÿš€ ๋น ๋ฅธ ์‹œ์ž‘

Python

from hunmin import transcribe

# ๊ธฐ๋ณธ โ€” ์•„์ด์šฉ ํ•œ๊ธ€
transcribe("student", "en")           # ์Šคํˆฌ๋˜ํŠธ
transcribe("Paris",   "fr")           # ํŒŒ๋ฆฌ
transcribe("ไธญๅ›ฝ",     "zh")           # ์ค‘๊ตฌ์–ด
transcribe("ใ“ใ‚“ใซใกใฏ",   "ja")           # ์ฝ˜๋‹ˆ์น˜ํ•˜

# Level 3 โ€” ์˜›ํ•œ๊ธ€ ์ •๋ฐ€ (ํ•œ๊ตญ์–ด์— ์—†๋Š” ์†Œ๋ฆฌ ํ‘œ๊ธฐ)
transcribe("vine",    "en", level=3)  # ใ…ธ์•„์ธ  (ใ…ธ = /v/)
transcribe("zoo",     "en", level=3)  # ใ…ฟ์šฐ    (ใ…ฟ = /z/)
transcribe("father",  "en", level=3)  # ใ†„์•„๋œ  (ใ†„ = /f/)

# Level 4 โ€” UHPS jamo ์‹œํ€€์Šค (ML / ์—ฐ๊ตฌ์šฉ)
transcribe("student", "en", level=4)  # ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ
transcribe("ไธญๅ›ฝ",     "zh", level=4)  # ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“

CLI

$ hunmin --text "student" --lang en
์ŠคํŠœ๋˜ํŠธ

$ hunmin --text "ไธญๅ›ฝ" --lang zh --level 4
ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“

$ hunmin --demo
lang  text                  L1 (์•„์ด์šฉ)     L3 (์˜›ํ•œ๊ธ€)     L4 (jamo)
=================================================================
en    student              ์ŠคํŠœ๋˜ํŠธ         ์ŠคํŠœ๋˜ํŠธ         ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ
en    father               ํŒŒ๋”            ใ†„์•„๋œ            ใ†„ใ…ใ„ทใ…“ใ„น
es    familia              ํŒŒ๋ฐ€๋ฆฌ์•„         ใ†„์•„๋ฐ€๋ฆฌ์•„        ใ†„ใ…ใ…ใ…ฃใ„นใ…ฃใ…
ru    ะœะพัะบะฒะฐ               ๋ชจ์Šคํฌ๋ฐ”         ๋ชจ์Šคํฌใ…ธ์•„        ใ…ใ…—ใ……ใ…‹ใ…ธใ…
zh    ไธญๅ›ฝ                  ์ค‘๊ตฌ์–ด           ์ค‘๊ตฌ์–ด            ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“
ja    ๆฑไบฌ                  ํ† ์šฐ์ฟ„์šฐ         ํ† ์šฐ์ฟ„์šฐ          ใ…Œใ…—ใ…œใ…‹ใ…›ใ…œ
ko    ๅคง้Ÿ“ๆฐ‘ๅœ‹              ๋Œ€ํ•œ๋ฏผ๊ตญ         ๋Œ€ํ•œ๋ฏผ๊ตญ          ใ„ทใ…ใ…Žใ…ใ„ดใ…ใ…ฃใ„ดใ„ฑใ…œใ„ฑ
...

๐ŸŒ ์ง€์› ์–ธ์–ด (14๊ฐœ)

์ฝ”๋“œ ์–ธ์–ด ๋ฐฉ์‹ ์ •ํ™•๋„
en ์˜์–ด CMU ์‚ฌ์ „ + ๋ฃฐ 100% (์‚ฌ์ „ ๋‚ด)
es ์ŠคํŽ˜์ธ์–ด ๊ธ€์ž ๋ฃฐ 99.9%
it ์ดํƒˆ๋ฆฌ์•„์–ด ๊ธ€์ž ๋ฃฐ 99.8%
de ๋…์ผ์–ด ๊ธ€์ž ๋ฃฐ 99.0%
ru ๋Ÿฌ์‹œ์•„์–ด (Cyrillic) ๊ธ€์ž ๋ฃฐ 100.0%
fr ํ”„๋ž‘์Šค์–ด ๊ธ€์ž ๋ฃฐ 99.8%
pt ํฌ๋ฅดํˆฌ๊ฐˆ์–ด ๊ธ€์ž ๋ฃฐ 99.8%
nl ๋„ค๋œ๋ž€๋“œ์–ด ๊ธ€์ž ๋ฃฐ 99.6%
pl ํด๋ž€๋“œ์–ด ๊ธ€์ž ๋ฃฐ 99.4%
tr ํ„ฐํ‚ค์–ด ๊ธ€์ž ๋ฃฐ 100.0%
id ์ธ๋„๋„ค์‹œ์•„์–ด ๊ธ€์ž ๋ฃฐ 99.4%
ja ์ผ๋ณธ์–ด pykakasi + ๋ฃฐ 100%
zh ์ค‘๊ตญ์–ด (๋ถ๊ฒฝ์–ด) pypinyin + ๋ฃฐ 100%
ko ํ•œ๊ตญ์–ด (ํ•œ๊ธ€+ํ•œ์ž) hanja + native 100%

๐ŸŽš๏ธ Level 1โ€“4

Level ์šฉ๋„ ์˜ˆ์‹œ: student
1 ์–ด๋ฆฐ์ด/์ผ๋ฐ˜ ํ•œ๊ธ€ ์ŠคํŠœ๋˜ํŠธ
2 ์ž์—ฐ์Šค๋Ÿฌ์šด ๋ฐœ์Œ (์—ฐ์Œ โ€” ํ–ฅํ›„) ์ŠคํŠœ๋˜ํŠธ
3 ์ •๋ฐ€ (์˜›ํ•œ๊ธ€ ใ†„ ใ…ธ ใ…ฟ ์‚ฌ์šฉ) (ํ•ด๋‹น ์Œ ์—†์Œ)
4 UHPS ์ž๋ชจ ์‹œํ€€์Šค (ML/์˜ค๋””์˜ค ์—ฐ๊ตฌ) ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ

์˜›ํ•œ๊ธ€ ์˜ˆ: father Level 1: ํŒŒ๋” vs Level 3: ใ†„์•„๋œ. ใ†„๊ฐ€ /f/์ž„์„ ๋ช…์‹œ โ€” /p/์™€ ๊ตฌ๋ณ„.


๐Ÿง  ์ž‘๋™ ์›๋ฆฌ

                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
input + lang โ†’ โ”‚       Hunmin ๋ผ์šฐํ„ฐ     โ”‚
                โ””โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜
                   โ†“                  โ†“
        Latin / Cyrillic       ํ‘œ์˜๋ฌธ์ž (CJK)
        (en, es, it, de,       (ja, zh, ko)
         ru, fr, pt, nl,
         pl, tr, id)
                   โ†“                  โ†“
        ์–ธ์–ด๋ณ„ ๋ฃฐ ๋ชจ๋“ˆ            ๊ฒฐ์ •์  ์‚ฌ์ „
        (๊ธ€์ž โ†’ ์Œ์†Œ              (pykakasi /
         โ†’ ํ•œ๊ธ€ / ์ž๋ชจ)            pypinyin /
                                  hanja)
                   โ†“                  โ†“
                   โ””โ”€โ”€โ”€โ”€ ์ถœ๋ ฅ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                  (ํ•œ๊ธ€ / ์ž๋ชจ / ๋ถ„๋ฆฌ)

์™œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ? ํ‘œ์˜๋ฌธ์ž(ํ•œ์ž, ๆผขๅญ—)๋Š” ๊ธ€์ž ์ž์ฒด์— ๋ฐœ์Œ ์ •๋ณด๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค โ€” ์‚ฌ์ „ lookup์ด ์ •๋‹ต. ํ‘œ์Œ๋ฌธ์ž(Latin, Cyrillic)๋Š” ๊ธ€์žโ†’๋ฐœ์Œ ๊ทœ์น™์ด ์žˆ์Šต๋‹ˆ๋‹ค โ€” ์•Œ๊ณ ๋ฆฌ์ฆ˜์  ๋ณ€ํ™˜ ๊ฐ€๋Šฅ.


๐Ÿ”ฌ ML / ์—ฐ๊ตฌ์šฉ

ํŒจํ„ด ํ•™์Šต์ด ํ•„์š”ํ•˜๋ฉด Level 4 (์ž๋ชจ ๋ชจ๋“œ) ์ถœ๋ ฅ์ด ML ํŒŒ์ดํ”„๋ผ์ธ์— ๋ฐ”๋กœ ๋“ค์–ด๊ฐ‘๋‹ˆ๋‹ค.

hunmin.transcribe("hello", "en", level=4)  # ใ…Žใ…”ใ„นใ…—ใ…œ

์ž‘์€ transformer (~1.4M ํŒŒ๋ผ๋ฏธํ„ฐ) ํ•œ ๊ฐœ๋กœ 326K (text, jamo) ํŽ˜์–ด ํ•™์Šตํ•˜๋ฉด ํ…Œ์ŠคํŠธ์…‹ 97% exact / 99% char ์ •ํ™•๋„ ๋„๋‹ฌ. (docs/RESEARCH.md).


๐Ÿ“œ UHPS โ€” Universal Hangul Phoneme Set (์ž๋ชจ 45๊ฐœ)

์ž์Œ (24) ๋ชจ์Œ (21)
ํ˜„๋Œ€ ใ„ฑ ใ„ฒ ใ„ด ใ„ท ใ„ธ ใ„น ใ… ใ…‚ ใ…ƒ ใ…… ใ…† ใ…‡ ใ…ˆ ใ…‰ ใ…Š ใ…‹ ใ…Œ ใ… ใ…Ž ใ… ใ… ใ…‘ ใ…’ ใ…“ ใ…” ใ…• ใ…– ใ…— ใ…˜ ใ…™ ใ…š ใ…› ใ…œ ใ… ใ…ž ใ…Ÿ ใ…  ใ…ก ใ…ข ใ…ฃ
์˜›ํ•œ๊ธ€ ใ†„ /f/ ยท ใ…ธ /v/ ยท ใ…ฟ /z/ ยท ใ† /ล‹/ ยท ใ†† /ส”/ โ€”

๊ฐ™์€ IPA โ†’ ๋ชจ๋“  ์–ธ์–ด์—์„œ ๊ฐ™์€ ์ž๋ชจ. ์ •ํ™•๋„๋ณด๋‹ค ์ผ๊ด€์„ฑ โ€” ML ์•ˆ์ •์„ฑ์„ ์œ„ํ•œ ์„ค๊ณ„.


๐Ÿ›๏ธ ๋น„์ „

์–ด๋ฆฐ์ด๋„ ๋ฉฐ์น  ์•ˆ์— ์ตํ˜€์„œ ๋ชจ๋“  ์†Œ๋ฆฌ๋ฅผ ์ ์„ ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋ผ. Even a child should learn it in days, and use it to write any sound. โ€” ่จ“ๆฐ‘ๆญฃ้Ÿณ ่งฃไพ‹ๆœฌ, 1446

์„ธ์ข…๋Œ€์™•์ด ์˜๋„ํ–ˆ๋˜ "๋ณดํŽธ์  ์Œ์„ฑ ํ‘œ๊ธฐ ์ฒด๊ณ„๋กœ์„œ์˜ ํ•œ๊ธ€" ๋ถ€ํ™œ.


๐Ÿ“ˆ ํ˜„์žฌ ์ƒํƒœ

  • v1.0 โ€” 14๊ฐœ ์–ธ์–ด, ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ํŒŒ์ดํ”„๋ผ์ธ, ํ…Œ์ŠคํŠธ ์ •ํ™•๋„ 98.4%, UHPS freeze.

๐Ÿ“ ๋ผ์ด์„ ์Šค

MIT.


๐Ÿ™ ์‚ฌ์šฉ ๋„๊ตฌ

  • pykakasi โ€” ์ผ๋ณธ์–ด ๊ฐ€๋‚˜ ๋ณ€ํ™˜
  • pypinyin โ€” ์ค‘๊ตญ์–ด ๋ณ‘์Œ
  • hanja โ€” ํ•œ๊ตญ์–ด ํ•œ์ž์Œ
  • CMU Pronouncing Dictionary โ€” ์˜์–ด G2P
  • hermitdave/FrequencyWords โ€” corpus seed (OpenSubtitles)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hunmin-1.2.0.tar.gz (975.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hunmin-1.2.0-py3-none-any.whl (997.2 kB view details)

Uploaded Python 3

File details

Details for the file hunmin-1.2.0.tar.gz.

File metadata

  • Download URL: hunmin-1.2.0.tar.gz
  • Upload date:
  • Size: 975.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for hunmin-1.2.0.tar.gz
Algorithm Hash digest
SHA256 1dc32e5ee02988b060c17f587d8e79705b12252f7f79eb81a882f6713c324703
MD5 cd77893521cfab4e9e4c5ae2cd67de28
BLAKE2b-256 12402fd182d25f33a7d96d814df507eef99bfd7e5faeadb090ad52a81a7431eb

See more details on using hashes here.

File details

Details for the file hunmin-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: hunmin-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 997.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for hunmin-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 69bff683b166977ab15035afdf680d8bb74d130528ebd3a4ce7d56697756360e
MD5 f5acdead52eea1c56f2cf0ac1b089df5
BLAKE2b-256 2ab42a43b0540c98cb892a547092ee1272e7efa62ad74768ad7240a8a43e4d39

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page