Python implementation of kakasi - kana kanji simple inversion library
Project description
Overview
pykakasi is a Python Natural Language Processing (NLP) library to transliterate hiragana, katakana and kanji (Japanese text) into rōmaji (Latin/Roman alphabet).
It is based on the kakasi library, which is written in C.
Install (from PyPI): pip install pykakasi
Examples
Transliterate Japanese text to rōmaji:
>>> import pykakasi
>>>
>>> text = u"かな漢字交じり文"
>>> kakasi = pykakasi.kakasi()
>>> kakasi.setMode("H","a") # Hiragana to ascii, default: no conversion
>>> kakasi.setMode("K","a") # Katakana to ascii, default: no conversion
>>> kakasi.setMode("J","a") # Japanese to ascii, default: no conversion
>>> kakasi.setMode("r","Hepburn") # default: use Hepburn Roman table
>>> kakasi.setMode("s", True) # add space, default: no separator
>>> kakasi.setMode("C", True) # capitalize, default: no capitalize
>>> conv = kakasi.getConverter()
>>> result = conv.do(text)
>>> print(result)
kana Kanji Majiri Bun
Tokenize Japanese text (split by word boundaries), equivalent to kakasi’s wakati gaki option:
>>> wakati = pykakasi.wakati()
>>> conv = wakati.getConverter()
>>> result = conv.do(text)
>>> print(result)
かな 漢字 交じり 文
Add furigana (pronounciation aid) in rōmaji to text:
>>> kakasi = pykakasi.kakasi()
>>> kakasi.setMode("J","aF") # Japanese to furigana
>>> kakasi.setMode("H","aF") # Japanese to furigana
>>> conv = kakasi.getConverter()
>>> result = conv.do(text)
>>> print(result)
かな[kana] 漢字[Kanji] 交じり[Majiri] 文[Bun]
Input mode values: “J” (Japanese: kanji, hiragana and katakana), “H” (hiragana), “K” (katakana).
Output mode values: “H” (hiragana), “K” (katakana), “a” (alphabet / rōmaji), “aF” (furigana in rōmaji).
There are other setMode switches which control output:
Copyright and License
Copyright 2010-2019 Hiroshi Miura <miurahr@linux.com>
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pykakasi-1.1b1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b86ee1f17b8f4e0581fd82f3e859debce66bfc3be3d7012e7c5d2a404204bcc3 |
|
MD5 | dadcecec93b24d791e2006270b04411b |
|
BLAKE2b-256 | 071d2063135afd0a954457f9b3a4f04680ad4260a14a6e5dcd2f6c0018676b75 |