Skip to main content

Universal phonetic Hangul transcription โ€” convert any language into readable Korean.

Project description

๐ŸŽผ Hunmin

์™ธ๊ตญ์–ด๋ฅผ ํ•œ๊ธ€๋กœ โ€” ์•„์ด๋„ ์ฝ์„ ์ˆ˜ ์žˆ๋Š” ๋ฐœ์Œ ์•…๋ณด Convert any language into a readable phonetic Hangul score.

from hunmin import transcribe

transcribe("student", "en")        # ์Šคํˆฌ๋˜ํŠธ
transcribe("ไธญๅ›ฝ",     "zh")        # ์ค‘๊ตฌ์–ด
transcribe("ๆฑไบฌ",     "ja")        # ํ† ์šฐ์ฟ„์šฐ
transcribe("familia", "es", level=3)   # ใ†„์•„๋ฐ€๋ฆฌ์•„  (์˜›ํ•œ๊ธ€ ใ†„ = /f/)
transcribe("ะœะพัะบะฒะฐ",  "ru", level=4)   # ใ…ใ…—ใ……ใ…‹ใ…ธใ…  (UHPS jamo)

14๊ฐœ ์–ธ์–ด. ์ˆœ์ˆ˜ ๋ฃฐ ๊ธฐ๋ฐ˜. ์˜์กด์„ฑ 0 (CJK๋Š” ์„ ํƒ์ ).


โœจ ์™œ Hunmin์ธ๊ฐ€?

  • ํ•œ๊ธ€์€ ์›๋ž˜ ๋ฐœ์Œ ์•…๋ณด์˜€์Šต๋‹ˆ๋‹ค. 1443๋…„ ์„ธ์ข…์ด ๊ทธ๋ ‡๊ฒŒ ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์ฝ์œผ๋ฉด ๊ทธ ์–ธ์–ด๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์–ด๋ฆฐ์ด๊ฐ€ ์ฝ์–ด๋„ ์™ธ๊ตญ์ธ์ด ์•Œ์•„๋“ฃ๋Š” ๋ฐœ์Œ.
  • ์˜›ํ•œ๊ธ€ ๋ถ€ํ™œ. ํ•œ๊ตญ์–ด๊ฐ€ ์žƒ์€ ์†Œ๋ฆฌ๋“ค โ€” ใ†„ (/f/), ใ…ธ (/v/), ใ…ฟ (/z/) โ€” ๋‹ค์‹œ ์‚ฌ์šฉ.
  • ํ•˜๋‚˜์˜ API, 14๊ฐœ ์–ธ์–ด. ๊ฐ™์€ ํ˜ธ์ถœ, ๊ฒฐ์ •์  ์ถœ๋ ฅ.
  • ๋ธ”๋ž™๋ฐ•์Šค ์—†์Œ. 100% ๋ฃฐ ๊ธฐ๋ฐ˜ (default). ML ๋ชจ๋ธ์€ ์—ฐ๊ตฌ์šฉ ์˜ต์…˜.

๐Ÿ“ฆ ์„ค์น˜

pip install hunmin              # 11๊ฐœ (Latin/Cyrillic ์–ธ์–ด)
pip install hunmin[cjk]         # + ์ผ๋ณธ์–ด / ์ค‘๊ตญ์–ด / ํ•œ๊ตญ์–ด
pip install hunmin[all]         # ๋ชจ๋‘ + ์›น ๋ฐ๋ชจ

๐Ÿš€ ๋น ๋ฅธ ์‹œ์ž‘

Python

from hunmin import transcribe

# ๊ธฐ๋ณธ โ€” ์•„์ด์šฉ ํ•œ๊ธ€
transcribe("student", "en")           # ์Šคํˆฌ๋˜ํŠธ
transcribe("Paris",   "fr")           # ํŒŒ๋ฆฌ
transcribe("ไธญๅ›ฝ",     "zh")           # ์ค‘๊ตฌ์–ด
transcribe("ใ“ใ‚“ใซใกใฏ",   "ja")           # ์ฝ˜๋‹ˆ์น˜ํ•˜

# Level 3 โ€” ์˜›ํ•œ๊ธ€ ์ •๋ฐ€ (ํ•œ๊ตญ์–ด์— ์—†๋Š” ์†Œ๋ฆฌ ํ‘œ๊ธฐ)
transcribe("vine",    "en", level=3)  # ใ…ธ์•„์ธ  (ใ…ธ = /v/)
transcribe("zoo",     "en", level=3)  # ใ…ฟ์šฐ    (ใ…ฟ = /z/)
transcribe("father",  "en", level=3)  # ใ†„์•„๋œ  (ใ†„ = /f/)

# Level 4 โ€” UHPS jamo ์‹œํ€€์Šค (ML / ์—ฐ๊ตฌ์šฉ)
transcribe("student", "en", level=4)  # ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ
transcribe("ไธญๅ›ฝ",     "zh", level=4)  # ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“

CLI

$ hunmin --text "student" --lang en
์ŠคํŠœ๋˜ํŠธ

$ hunmin --text "ไธญๅ›ฝ" --lang zh --level 4
ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“

$ hunmin --demo
lang  text                  L1 (์•„์ด์šฉ)     L3 (์˜›ํ•œ๊ธ€)     L4 (jamo)
=================================================================
en    student              ์ŠคํŠœ๋˜ํŠธ         ์ŠคํŠœ๋˜ํŠธ         ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ
en    father               ํŒŒ๋”            ใ†„์•„๋œ            ใ†„ใ…ใ„ทใ…“ใ„น
es    familia              ํŒŒ๋ฐ€๋ฆฌ์•„         ใ†„์•„๋ฐ€๋ฆฌ์•„        ใ†„ใ…ใ…ใ…ฃใ„นใ…ฃใ…
ru    ะœะพัะบะฒะฐ               ๋ชจ์Šคํฌ๋ฐ”         ๋ชจ์Šคํฌใ…ธ์•„        ใ…ใ…—ใ……ใ…‹ใ…ธใ…
zh    ไธญๅ›ฝ                  ์ค‘๊ตฌ์–ด           ์ค‘๊ตฌ์–ด            ใ…ˆใ…œใ…‡ใ„ฑใ…œใ…“
ja    ๆฑไบฌ                  ํ† ์šฐ์ฟ„์šฐ         ํ† ์šฐ์ฟ„์šฐ          ใ…Œใ…—ใ…œใ…‹ใ…›ใ…œ
ko    ๅคง้Ÿ“ๆฐ‘ๅœ‹              ๋Œ€ํ•œ๋ฏผ๊ตญ         ๋Œ€ํ•œ๋ฏผ๊ตญ          ใ„ทใ…ใ…Žใ…ใ„ดใ…ใ…ฃใ„ดใ„ฑใ…œใ„ฑ
...

๐ŸŒ ์ง€์› ์–ธ์–ด (14๊ฐœ)

์ฝ”๋“œ ์–ธ์–ด ๋ฐฉ์‹ ์ •ํ™•๋„
en ์˜์–ด CMU ์‚ฌ์ „ + ๋ฃฐ 100% (์‚ฌ์ „ ๋‚ด)
es ์ŠคํŽ˜์ธ์–ด ๊ธ€์ž ๋ฃฐ 99.9%
it ์ดํƒˆ๋ฆฌ์•„์–ด ๊ธ€์ž ๋ฃฐ 99.8%
de ๋…์ผ์–ด ๊ธ€์ž ๋ฃฐ 99.0%
ru ๋Ÿฌ์‹œ์•„์–ด (Cyrillic) ๊ธ€์ž ๋ฃฐ 100.0%
fr ํ”„๋ž‘์Šค์–ด ๊ธ€์ž ๋ฃฐ 99.8%
pt ํฌ๋ฅดํˆฌ๊ฐˆ์–ด ๊ธ€์ž ๋ฃฐ 99.8%
nl ๋„ค๋œ๋ž€๋“œ์–ด ๊ธ€์ž ๋ฃฐ 99.6%
pl ํด๋ž€๋“œ์–ด ๊ธ€์ž ๋ฃฐ 99.4%
tr ํ„ฐํ‚ค์–ด ๊ธ€์ž ๋ฃฐ 100.0%
id ์ธ๋„๋„ค์‹œ์•„์–ด ๊ธ€์ž ๋ฃฐ 99.4%
ja ์ผ๋ณธ์–ด pykakasi + ๋ฃฐ 100%
zh ์ค‘๊ตญ์–ด (๋ถ๊ฒฝ์–ด) pypinyin + ๋ฃฐ 100%
ko ํ•œ๊ตญ์–ด (ํ•œ๊ธ€+ํ•œ์ž) hanja + native 100%

๐ŸŽš๏ธ Level 1โ€“4

Level ์šฉ๋„ ์˜ˆ์‹œ: student
1 ์–ด๋ฆฐ์ด/์ผ๋ฐ˜ ํ•œ๊ธ€ ์ŠคํŠœ๋˜ํŠธ
2 ์ž์—ฐ์Šค๋Ÿฌ์šด ๋ฐœ์Œ (์—ฐ์Œ โ€” ํ–ฅํ›„) ์ŠคํŠœ๋˜ํŠธ
3 ์ •๋ฐ€ (์˜›ํ•œ๊ธ€ ใ†„ ใ…ธ ใ…ฟ ์‚ฌ์šฉ) (ํ•ด๋‹น ์Œ ์—†์Œ)
4 UHPS ์ž๋ชจ ์‹œํ€€์Šค (ML/์˜ค๋””์˜ค ์—ฐ๊ตฌ) ใ……ใ…Œใ…œใ„ทใ…“ใ„ดใ…Œ

์˜›ํ•œ๊ธ€ ์˜ˆ: father Level 1: ํŒŒ๋” vs Level 3: ใ†„์•„๋œ. ใ†„๊ฐ€ /f/์ž„์„ ๋ช…์‹œ โ€” /p/์™€ ๊ตฌ๋ณ„.


๐Ÿง  ์ž‘๋™ ์›๋ฆฌ

                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
input + lang โ†’ โ”‚       Hunmin ๋ผ์šฐํ„ฐ     โ”‚
                โ””โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜
                   โ†“                  โ†“
        Latin / Cyrillic       ํ‘œ์˜๋ฌธ์ž (CJK)
        (en, es, it, de,       (ja, zh, ko)
         ru, fr, pt, nl,
         pl, tr, id)
                   โ†“                  โ†“
        ์–ธ์–ด๋ณ„ ๋ฃฐ ๋ชจ๋“ˆ            ๊ฒฐ์ •์  ์‚ฌ์ „
        (๊ธ€์ž โ†’ ์Œ์†Œ              (pykakasi /
         โ†’ ํ•œ๊ธ€ / ์ž๋ชจ)            pypinyin /
                                  hanja)
                   โ†“                  โ†“
                   โ””โ”€โ”€โ”€โ”€ ์ถœ๋ ฅ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                  (ํ•œ๊ธ€ / ์ž๋ชจ / ๋ถ„๋ฆฌ)

์™œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ? ํ‘œ์˜๋ฌธ์ž(ํ•œ์ž, ๆผขๅญ—)๋Š” ๊ธ€์ž ์ž์ฒด์— ๋ฐœ์Œ ์ •๋ณด๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค โ€” ์‚ฌ์ „ lookup์ด ์ •๋‹ต. ํ‘œ์Œ๋ฌธ์ž(Latin, Cyrillic)๋Š” ๊ธ€์žโ†’๋ฐœ์Œ ๊ทœ์น™์ด ์žˆ์Šต๋‹ˆ๋‹ค โ€” ์•Œ๊ณ ๋ฆฌ์ฆ˜์  ๋ณ€ํ™˜ ๊ฐ€๋Šฅ.


๐Ÿ”ฌ ML / ์—ฐ๊ตฌ์šฉ

ํŒจํ„ด ํ•™์Šต์ด ํ•„์š”ํ•˜๋ฉด Level 4 (์ž๋ชจ ๋ชจ๋“œ) ์ถœ๋ ฅ์ด ML ํŒŒ์ดํ”„๋ผ์ธ์— ๋ฐ”๋กœ ๋“ค์–ด๊ฐ‘๋‹ˆ๋‹ค.

hunmin.transcribe("hello", "en", level=4)  # ใ…Žใ…”ใ„นใ…—ใ…œ

์ž‘์€ transformer (~1.4M ํŒŒ๋ผ๋ฏธํ„ฐ) ํ•œ ๊ฐœ๋กœ 326K (text, jamo) ํŽ˜์–ด ํ•™์Šตํ•˜๋ฉด ํ…Œ์ŠคํŠธ์…‹ 97% exact / 99% char ์ •ํ™•๋„ ๋„๋‹ฌ. (docs/RESEARCH.md).


๐Ÿ“œ UHPS โ€” Universal Hangul Phoneme Set (์ž๋ชจ 45๊ฐœ)

์ž์Œ (24) ๋ชจ์Œ (21)
ํ˜„๋Œ€ ใ„ฑ ใ„ฒ ใ„ด ใ„ท ใ„ธ ใ„น ใ… ใ…‚ ใ…ƒ ใ…… ใ…† ใ…‡ ใ…ˆ ใ…‰ ใ…Š ใ…‹ ใ…Œ ใ… ใ…Ž ใ… ใ… ใ…‘ ใ…’ ใ…“ ใ…” ใ…• ใ…– ใ…— ใ…˜ ใ…™ ใ…š ใ…› ใ…œ ใ… ใ…ž ใ…Ÿ ใ…  ใ…ก ใ…ข ใ…ฃ
์˜›ํ•œ๊ธ€ ใ†„ /f/ ยท ใ…ธ /v/ ยท ใ…ฟ /z/ ยท ใ† /ล‹/ ยท ใ†† /ส”/ โ€”

๊ฐ™์€ IPA โ†’ ๋ชจ๋“  ์–ธ์–ด์—์„œ ๊ฐ™์€ ์ž๋ชจ. ์ •ํ™•๋„๋ณด๋‹ค ์ผ๊ด€์„ฑ โ€” ML ์•ˆ์ •์„ฑ์„ ์œ„ํ•œ ์„ค๊ณ„.


๐Ÿ›๏ธ ๋น„์ „

์–ด๋ฆฐ์ด๋„ ๋ฉฐ์น  ์•ˆ์— ์ตํ˜€์„œ ๋ชจ๋“  ์†Œ๋ฆฌ๋ฅผ ์ ์„ ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋ผ. Even a child should learn it in days, and use it to write any sound. โ€” ่จ“ๆฐ‘ๆญฃ้Ÿณ ่งฃไพ‹ๆœฌ, 1446

์„ธ์ข…๋Œ€์™•์ด ์˜๋„ํ–ˆ๋˜ "๋ณดํŽธ์  ์Œ์„ฑ ํ‘œ๊ธฐ ์ฒด๊ณ„๋กœ์„œ์˜ ํ•œ๊ธ€" ๋ถ€ํ™œ.


๐Ÿ“ˆ ํ˜„์žฌ ์ƒํƒœ

  • v1.0 โ€” 14๊ฐœ ์–ธ์–ด, ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ํŒŒ์ดํ”„๋ผ์ธ, ํ…Œ์ŠคํŠธ ์ •ํ™•๋„ 98.4%, UHPS freeze.

๐Ÿ“ ๋ผ์ด์„ ์Šค

MIT.


๐Ÿ™ ์‚ฌ์šฉ ๋„๊ตฌ

  • pykakasi โ€” ์ผ๋ณธ์–ด ๊ฐ€๋‚˜ ๋ณ€ํ™˜
  • pypinyin โ€” ์ค‘๊ตญ์–ด ๋ณ‘์Œ
  • hanja โ€” ํ•œ๊ตญ์–ด ํ•œ์ž์Œ
  • CMU Pronouncing Dictionary โ€” ์˜์–ด G2P
  • hermitdave/FrequencyWords โ€” corpus seed (OpenSubtitles)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hunmin-1.0.2.tar.gz (52.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hunmin-1.0.2-py3-none-any.whl (71.7 kB view details)

Uploaded Python 3

File details

Details for the file hunmin-1.0.2.tar.gz.

File metadata

  • Download URL: hunmin-1.0.2.tar.gz
  • Upload date:
  • Size: 52.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for hunmin-1.0.2.tar.gz
Algorithm Hash digest
SHA256 36846ac82600093bf08e35a1ed9fc364dc53ed9e7149085eb746cdb456c3e597
MD5 a492880ffb29a9236bca5b6ac10d58db
BLAKE2b-256 3d04b8e4c929660e4ddf94d9af0e141c944584aef38bd1a9717048fda12304a2

See more details on using hashes here.

File details

Details for the file hunmin-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: hunmin-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 71.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for hunmin-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d3b1bd73993c0f7dcd32822e783047e7bc6e9480be1c161c595f48be34b1bdff
MD5 6127ff65b63071705f9b4f59a6e22c9a
BLAKE2b-256 b1cbeb11044f763eb3c6ccc23389f7e88f14d90f9c77adefdc0e6f729ffa566d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page