Skip to main content

Lightweight Japanese text-to-IPA phoneme converter extracted from misaki

Project description

misaki-ja-lightning ⚡

Lightweight Japanese text-to-IPA phoneme converter extracted from the misaki library. This package contains only the Japanese G2P (grapheme-to-phoneme) functionality with minimal dependencies.

Features

  • 🇯🇵 Convert Japanese text (hiragana, katakana, kanji) to IPA phonemes
  • 🔢 Convert numbers to Japanese kana
  • ⚡ Lightning-fast with minimal dependencies
  • 🎯 Focused on Japanese language only
  • 🔧 Supports both cutlet (default) and pyopenjtalk backends
  • ☁️ Serverless-friendly: UniDic dictionary auto-downloads to /tmp when needed

Installation

# Basic installation (cutlet backend, no bundled dictionary)
pip install misaki-ja-lightning

# With bundled dictionary (for local development)
pip install misaki-ja-lightning[unidic]

Note for Vercel/Serverless: Use basic installation. The UniDic dictionary will automatically download to /tmp on first use, keeping your deployment under size limits.

Usage

Basic G2P Conversion

from misaki_ja_lightning import JAG2P

# Initialize with cutlet backend (default, recommended)
g2p = JAG2P(version='cutlet')

# Or use pyopenjtalk backend
# g2p = JAG2P(version='pyopenjtalk')

# Convert Japanese text to IPA phonemes
text = "こんにちは、世界"
phonemes, tokens = g2p(text)

print(phonemes)  # IPA phoneme string with pitch information

Number to Kana Conversion

from misaki_ja_lightning import Convert, ConvertKanji

# Convert Arabic numbers to Japanese
result = Convert(12345, 'hiragana')
print(result)  # いちまんにせんさんびゃくよんじゅうご

# Convert to kanji
result = Convert(12345, 'kanji')
print(result)  # 一万二千三百四十五

# Convert to romaji
result = Convert(12345, 'romaji')
print(result)  # ichi man ni sen san byaku yon juu go

# Supported formats: 'hiragana', 'kanji', 'romaji'
# Note: 'katakana' is not supported in num2kana module

# Convert kanji numbers back to Arabic
number = ConvertKanji("一万二千三百四十五")
print(number)  # 12345

Token-level Processing

from misaki_ja_lightning import JAG2P

g2p = JAG2P()
phonemes, tokens = g2p("今日は良い天気ですね")

for token in tokens:
    print(f"Text: {token.text}")
    print(f"Phonemes: {token.phonemes}")
    print(f"Tag: {token.tag}")
    print(f"Pitch: {token._.pitch}")
    print("---")

What's Included

This lightweight package includes:

  • ja.py - Japanese G2P converter supporting both cutlet and pyopenjtalk
  • cutlet.py - Cutlet backend for IPA conversion
  • num2kana.py - Number to Japanese kana converter
  • token.py - Token data structure
  • unidic_downloader.py - Runtime dictionary downloader for serverless

Differences from Original Misaki

  • ✅ Japanese-only (removed other languages)
  • ✅ Removed addict dependency
  • ✅ Simplified token structure
  • ✅ Smart dictionary loading: uses bundled unidic-lite if available, downloads to /tmp otherwise
  • ✅ Serverless-optimized

Requirements

  • Python >= 3.8
  • fugashi, mecab-python3, jaconv, mojimoji (for cutlet backend)
  • pyopenjtalk-somniumism (forked version with /tmp support)
  • unidic-lite (optional, auto-downloads if not present)

Note: This package intelligently handles dictionaries:

  • Local: Uses bundled unidic-lite if installed
  • Serverless: Auto-downloads dictionary to /tmp on first use
  • Both: pyopenjtalk dictionary also auto-downloads to /tmp

This allows the package to work in serverless environments like Vercel while keeping deployment size under limits.

License

MIT License (inherited from original misaki library)

Credits

This package is extracted from misaki by hexgrad. All credit for the original implementation goes to the misaki authors.

The num2kana module is based on Convert-Numbers-to-Japanese by Greatdane (MIT License).

Related Projects

Use Cases

Perfect for:

  • Text-to-speech applications
  • Japanese language learning tools
  • Phoneme-based synthesis
  • Lightweight Japanese text processing

Support

For issues and questions, please visit the GitHub Issues page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

misaki_ja_lightning-2.0.3.tar.gz (503.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

misaki_ja_lightning-2.0.3-py3-none-any.whl (493.3 kB view details)

Uploaded Python 3

File details

Details for the file misaki_ja_lightning-2.0.3.tar.gz.

File metadata

  • Download URL: misaki_ja_lightning-2.0.3.tar.gz
  • Upload date:
  • Size: 503.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for misaki_ja_lightning-2.0.3.tar.gz
Algorithm Hash digest
SHA256 2d5b75614adc35e40ffa00e1aae243155e12f5dae2b0b3e9a909fcbbfdb93729
MD5 3f8667790b1117e7a7e820b13c640d1e
BLAKE2b-256 d274ae5d37e61deba58953b9bdfb42658613cff8ae2613356fca95ba11566165

See more details on using hashes here.

File details

Details for the file misaki_ja_lightning-2.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for misaki_ja_lightning-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2c6e59ea31693a57d6bbf0ac1b175630878c9359b3468cd51d5202ed8324f0e6
MD5 19dd323805d68ac09e752918ad53bf3e
BLAKE2b-256 eb9fe75c18bda7db6e3ad11cc266871397d0f9a2eb542972695631576a4eee2f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page