Python implementation of kakasi - kana kanji simple inversion library
Project description
Overview
pykakasi is a Python Natural Language Processing (NLP) library to transliterate hiragana, katakana and kanji (Japanese text) into rōmaji (Latin/Roman alphabet).
It is based on the kakasi library, which is written in C.
Install (from PyPI): pip install pykakasi
Supported python versions
pykakasi 1.2 supports python 2.7, python 3.5, 3.6, 3.7
pykakasi 2.0 supports python 3.6, 3.7, 3.8, pypy3.6-7.1.1
Examples
Transliterate Japanese text to rōmaji:
>>> import pykakasi
>>>
>>> text = u"かな漢字交じり文"
>>> kakasi = pykakasi.kakasi()
>>> kakasi.setMode("H","a") # Hiragana to ascii, default: no conversion
>>> kakasi.setMode("K","a") # Katakana to ascii, default: no conversion
>>> kakasi.setMode("J","a") # Japanese to ascii, default: no conversion
>>> kakasi.setMode("r","Hepburn") # default: use Hepburn Roman table
>>> kakasi.setMode("s", True) # add space, default: no separator
>>> kakasi.setMode("C", True) # capitalize, default: no capitalize
>>> conv = kakasi.getConverter()
>>> result = conv.do(text)
>>> print(result)
kana Kanji Majiri Bun
Tokenize Japanese text (split by word boundaries), equivalent to kakasi’s wakati gaki option:
>>> wakati = pykakasi.wakati()
>>> conv = wakati.getConverter()
>>> result = conv.do(text)
>>> print(result)
かな 漢字 交じり 文
Add furigana (pronounciation aid) in rōmaji to text:
>>> kakasi = pykakasi.kakasi()
>>> kakasi.setMode("J","aF") # Japanese to furigana
>>> kakasi.setMode("H","aF") # Japanese to furigana
>>> conv = kakasi.getConverter()
>>> result = conv.do(text)
>>> print(result)
かな[kana] 漢字[Kanji] 交じり[Majiri] 文[Bun]
Input mode values: “J” (Japanese: kanji, hiragana and katakana), “H” (hiragana), “K” (katakana).
Output mode values: “H” (hiragana), “K” (katakana), “a” (alphabet / rōmaji), “aF” (furigana in rōmaji).
There are other setMode switches which control output:
Copyright and License
Copyright 2010-2020 Hiroshi Miura <miurahr@linux.com>
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pykakasi-2.0.0a1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4fa0f8318e93d3bc1f565f4c81915d5edac742daae06fd9f5fcbcbd6d357dd0a |
|
MD5 | 3d9ac8e909d0470e373ebca1e320635f |
|
BLAKE2b-256 | e83600c3bc384c0ae82221c32bf325493ca4e39a46f065507892fac16f368fe3 |