Simple Python package for getting japanese reading (yomigana) using MeCab
Project description
MeCab Text Cleaner
This is a simple Python package for getting japanese readings (yomigana) and accents using MeCab. Please also consider using pyopenjtalk (no accents) or pyopenjtalk_g2p_prosody (ESPnet) (with accents), as this package does not account for accent changes in compound words.
Installation
Install this via pip or pipx (or your favourite package manager):
pipx install mecab-text-cleaner[unidecode,unidic]
pip install mecab-text-cleaner[unidecode,unidic]
Usage
> mtc いい天気ですね。
イ]ー テ]ンキ デス ネ。
> mtc いい天気ですね。 --ascii
i] te]nki desu ne.
> mtc いい天気ですね --no-add-atype --no-add-blank-between-words
イーテンキデスネ
> mtc いい天気ですね --no-add-atype --no-add-blank-between-words -r kana
イイテンキデスネ
from mecab_text_cleaner import to_reading, to_ascii_clean
assert to_reading(" 空、雲。\n雨!(") == "ソ]ラ、 ク]モ。\nア]メ!("
assert to_ascii_clean(" 한空、雲。\n雨!(") == "han so]ra, ku]mo. \na]me!("
Contributors ✨
Thanks goes to these wonderful people (emoji key):
This project follows the all-contributors specification. Contributions of any kind welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for mecab_text_cleaner-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7e87ed974daeb50184c55e8fc25fc3d0bb4e1949a534eb1116b1bb12c8eff1f |
|
MD5 | 9b3a0e4c2a60ec715d50246b5ad35330 |
|
BLAKE2b-256 | 3151da52e56d0647889f3699159168873f9bcf7125cf7eec48854a770b7dc9c3 |