Lightweight Japanese text-to-IPA phoneme converter extracted from misaki
Project description
misaki-ja-lightning ⚡
Lightweight Japanese text-to-IPA phoneme converter extracted from the misaki library. This package contains only the Japanese G2P (grapheme-to-phoneme) functionality with minimal dependencies.
Features
- 🇯🇵 Convert Japanese text (hiragana, katakana, kanji) to IPA phonemes
- 🔢 Convert numbers to Japanese kana
- ⚡ Lightning-fast with minimal dependencies
- 🎯 Focused on Japanese language only
- 🔧 Supports both
cutlet(default) andpyopenjtalkbackends - ☁️ Serverless-friendly: UniDic dictionary auto-downloads to
/tmpwhen needed
Installation
# Basic installation (cutlet backend, no bundled dictionary)
pip install misaki-ja-lightning
# With bundled dictionary (for local development)
pip install misaki-ja-lightning[unidic]
Note for Vercel/Serverless: Use basic installation. The UniDic dictionary will automatically download to /tmp on first use, keeping your deployment under size limits.
Usage
Basic G2P Conversion
from misaki_ja_lightning import JAG2P
# Initialize with cutlet backend (default, recommended)
g2p = JAG2P(version='cutlet')
# Or use pyopenjtalk backend
# g2p = JAG2P(version='pyopenjtalk')
# Convert Japanese text to IPA phonemes
text = "こんにちは、世界"
phonemes, tokens = g2p(text)
print(phonemes) # IPA phoneme string with pitch information
Number to Kana Conversion
from misaki_ja_lightning import Convert, ConvertKanji
# Convert Arabic numbers to Japanese
result = Convert(12345, 'hiragana')
print(result) # いちまんにせんさんびゃくよんじゅうご
# Convert to kanji
result = Convert(12345, 'kanji')
print(result) # 一万二千三百四十五
# Convert to romaji
result = Convert(12345, 'romaji')
print(result) # ichi man ni sen san byaku yon juu go
# Supported formats: 'hiragana', 'kanji', 'romaji'
# Note: 'katakana' is not supported in num2kana module
# Convert kanji numbers back to Arabic
number = ConvertKanji("一万二千三百四十五")
print(number) # 12345
Token-level Processing
from misaki_ja_lightning import JAG2P
g2p = JAG2P()
phonemes, tokens = g2p("今日は良い天気ですね")
for token in tokens:
print(f"Text: {token.text}")
print(f"Phonemes: {token.phonemes}")
print(f"Tag: {token.tag}")
print(f"Pitch: {token._.pitch}")
print("---")
What's Included
This lightweight package includes:
ja.py- Japanese G2P converter supporting both cutlet and pyopenjtalkcutlet.py- Cutlet backend for IPA conversionnum2kana.py- Number to Japanese kana convertertoken.py- Token data structureunidic_downloader.py- Runtime dictionary downloader for serverless
Differences from Original Misaki
- ✅ Japanese-only (removed other languages)
- ✅ Removed
addictdependency - ✅ Simplified token structure
- ✅ Smart dictionary loading: uses bundled unidic-lite if available, downloads to
/tmpotherwise - ✅ Serverless-optimized
Requirements
- Python >= 3.8
- fugashi, mecab-python3, jaconv, mojimoji (for cutlet backend)
- pyopenjtalk-somniumism (forked version with /tmp support)
- unidic-lite (optional, auto-downloads if not present)
Note: This package intelligently handles dictionaries:
- Local: Uses bundled unidic-lite if installed
- Serverless: Auto-downloads dictionary to
/tmpon first use - Both: pyopenjtalk dictionary also auto-downloads to
/tmp
This allows the package to work in serverless environments like Vercel while keeping deployment size under limits.
License
MIT License (inherited from original misaki library)
Credits
This package is extracted from misaki by hexgrad. All credit for the original implementation goes to the misaki authors.
The num2kana module is based on Convert-Numbers-to-Japanese by Greatdane (MIT License).
Related Projects
- misaki - Full multilingual G2P library
- Kokoro-82M - Text-to-speech model
- pyopenjtalk - Japanese text processing
Use Cases
Perfect for:
- Text-to-speech applications
- Japanese language learning tools
- Phoneme-based synthesis
- Lightweight Japanese text processing
Support
For issues and questions, please visit the GitHub Issues page.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file misaki_ja_lightning-2.0.2.tar.gz.
File metadata
- Download URL: misaki_ja_lightning-2.0.2.tar.gz
- Upload date:
- Size: 503.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5b3cb30f53919bbcddb783bff2d50bfffe465f8e4cb66795a96240fd10aa3df
|
|
| MD5 |
f57f0e13c0ce1918e036591a21614894
|
|
| BLAKE2b-256 |
523061d3ffb35b5b99ce637ea4e3b5b43c2d857005a1395e1ff3dc71226d2dc6
|
File details
Details for the file misaki_ja_lightning-2.0.2-py3-none-any.whl.
File metadata
- Download URL: misaki_ja_lightning-2.0.2-py3-none-any.whl
- Upload date:
- Size: 493.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af2e2615fee37ca69f5c56820815771e4ffcb862c1ba709047e5b3811fb494f4
|
|
| MD5 |
542c374feee26847d935929951891321
|
|
| BLAKE2b-256 |
041acb8f1078ac217afbf3b8ab8ee9529a4bf1a167a50c6e5b9d70b79afa88df
|