pykakasi

Python implementation of kakasi - kana kanji simple inversion library

These details have not been verified by PyPI

Project links

Homepage

Project description

Pykakasi

Overview

pykakasi is a Python Natural Language Processing (NLP) library to transliterate hiragana, katakana and kanji (Japanese text) into rōmaji (Latin/Roman alphabet). It can handle characters in NFC form.

It is based on the kakasi library, which is written in C.

Install (from PyPI): pip install pykakasi
Documentation available on readthedocs

Supported python versions

pykakasi 1.2 supports python 2.7, python 3.5, 3.6, 3.7
pykakasi 2.0 supports python 3.6, 3.7, 3.8, pypy3.6-7.1.1

Usage

Here is an usage of NewAPI for pykakasi v2.0.0 and later. Transliterate Japanese text to kana, hiragana and romaji:

import pykakasi
kks = pykakasi.kakasi()
text = "かな漢字"
result = kks.convert(text)
for item in result:
    print("{}: kana '{}', hiragana '{}', romaji: '{}'".format(item['orig'], item['kana'], item['hira'], item['hepburn']))

かな: kana 'カナ', hiragana: 'かな', romaji: 'kana'
漢字: kana 'カンジ', hiragana: 'かんじ', romaji: 'kanji'

Here is an example that output as similar with furigana mode.

import pykakasi
kks = pykakasi.kakasi()
text = "かな漢字交じり文"
result = kks.convert(text)
for item in result:
    print("{}[{}] ".format(item['orig'], item['hepburn'].capitalize()), end='')
print()

かな[Kana] 漢字[Kanji] 交じり[Majiri] 文[Bun]

Old API

There is also an old API for v1.2.

Transliterate Japanese text to rōmaji:

>>> import pykakasi
>>>
>>> text = u"かな漢字交じり文"
>>> kakasi = pykakasi.kakasi()
>>> kakasi.setMode("H","a") # Hiragana to ascii, default: no conversion
>>> kakasi.setMode("K","a") # Katakana to ascii, default: no conversion
>>> kakasi.setMode("J","a") # Japanese to ascii, default: no conversion
>>> kakasi.setMode("r","Hepburn") # default: use Hepburn Roman table
>>> kakasi.setMode("s", True) # add space, default: no separator
>>> kakasi.setMode("C", True) # capitalize, default: no capitalize
>>> conv = kakasi.getConverter()
>>> result = conv.do(text)
>>> print(result)
kana Kanji Majiri Bun

Tokenize Japanese text (split by word boundaries), equivalent to kakasi’s wakati gaki option:

>>> wakati = pykakasi.wakati()
>>> conv = wakati.getConverter()
>>> result = conv.do(text)
>>> print(result)
かな 漢字 交じり 文

Add furigana (pronounciation aid) in rōmaji to text:

>>> kakasi = pykakasi.kakasi()
>>> kakasi.setMode("J","aF") # Japanese to furigana
>>> kakasi.setMode("H","aF") # Japanese to furigana
>>> conv = kakasi.getConverter()
>>> result = conv.do(text)
>>> print(result)
かな[kana] 漢字[Kanji] 交じり[Majiri] 文[Bun]

Input mode values: “J” (Japanese: kanji, hiragana and katakana), “H” (hiragana), “K” (katakana).

Output mode values: “H” (hiragana), “K” (katakana), “a” (alphabet / rōmaji), “aF” (furigana in rōmaji).

There are other setMode switches which control output:

“r”: Romanisation table: Hepburn (default), Kunrei or Passport
“s”: Separator: False adds no spaces between words (default), True adds spaces between words
“C”: Capitalize: False adds no capital letters (default), True makes each word start with a capital letter

Copyright and License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

PyKakasi ChangeLog

All notable changes to this project will be documented in this file.

Unreleased

Added

Changed

Fixed

Deprecated

Removed

Security

v2.0.8 (4, May 2021)

Added

test: Benchmark and profiling (#122)

Changed

Performance: avoid ord() when checking long-mark, speed up about 6%
Reformat code by black(#121)

v2.0.7 (26, Feb. 2021)

Fixed

Infinite loop after running for a while, handle independent HW VOICED SOUND MARK (#115, #118)

v2.0.6 (7, Feb. 2021)

Fixed

Hiragana for Age countersa(#116,#117)

v2.0.5 (5, Feb. 2021)

Changed

CLI: use argparse for option parse(#113)

Fixed

Handle 思った、言った、行った properly.(#114)
CI: fix coveralls error

Deprecated

CI: drop travis-ci test and badge

v2.0.4 (26, Nov. 2020)

Fixed

CLI: Fix -v and -h option crash on python 3.7 and before (#108).

v2.0.3 (25, Nov. 2020)

Fixed

CLI: Fix -v and -h option crash (#108).

v2.0.2 (23, Jul. 2020)

Fixed

Fix convert() to handle Katakana correctly.(#103)

v2.0.1 (23, Jul. 2020)

Changed

Update setup.py, setup.cfg, tox.ini(#102)

Fixed

Fix convert() misses last part of a text (#99, #100)
Fix CI, coverage, and coveralls configurations(#101)

v2.0.0 (31, May. 2020)

Changed

Update test formatting.

v2.0.0b1 (9, May. 2020)

Changed

Update test.

v2.0.0a6 (30, Mar. 2020)

Added

Understand more kanji variations.

Fixed

Fix IVS handling to return correct word length to consume.

v2.0.0a5 (23, Mar. 2020)

Changed

Recognize UNICODE standard Ideographic Variation Selector(IVS) and transiliterate when used.(#97)

v2.0.0a4 (20, Mar. 2020)

Added

Add type hinting.

Changed

Refactoring dictionary generation classes.
call super() from wakati.__init__()
test: detection whether tox or raw pytest by TOX_ENV environment variable. When raw pytest, generate dictionaries as fixture. Previous versions uses –runenv option for pytest.

Fixed

NewAPI: fix return value when empty input string.

v2.0.0a3 (18, Mar. 2020)

Changed

Update test cases.

Fixed

Add guard for unknown symbol code point which lead NoneType error.

v2.0.0a2 (16, Mar. 2020)

Added

NewAPI: support kunrei and passport roman conversion rule.

Changed

CI: test by github actions

Fixed

Support an extended kana(#77) (U0001b150-U0001b152, U0001b164-U0001b167)

v2.0.0a1 (14, Mar. 2020)

Added

Structured interface of Kakasi class.(#21)

Changed

Github workflows for packaging and release.(#91)

Fixed

fix data kakasidict.utf8: “本蓮沼”

Deprecated

Drop python 2.7 support.

v1.2 (26, Sep, 2019)

Fixed

Fix out-of-index error when kana-dash is placed on first of same character group.(#85)

v1.1 (16, Sep, 2019)

v1.1b2 (14, Sep, 2019)

Fixed

Fix Long symble issue(#58) (thanks @northernbird and @ta9ya)

v1.1b1 (6, Sep, 2019)

Added

Add conversions: kya, kyu, kyo

Changed

Rewording README document

v1.1a1 (8, Jul, 2019)

Changed

pytest: now run on project root without tox, by generating dictionary as a test fixture.
tox: run tox test with installed dictionary instead of a generated fixture.
Optimize kana conversion function.
Move kakasidict.py to src and conftest.py to tests

Fixed

Version naming follows PEP386.
Sometimes fails to insert space after punctuation(#79).
Special case in kana-roman passport conversion such as ‘etchu’ etc.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.3.0

Jun 24, 2024

2.3.0b1 pre-release

Apr 14, 2022

2.3.0a1 pre-release

Jul 25, 2021

2.2.1

Jul 10, 2021

2.2.0

Jun 22, 2021

2.2.0b3 pre-release

May 19, 2021

2.2.0b2 pre-release

May 17, 2021

2.2.0b1 pre-release

May 16, 2021

2.1.1

May 16, 2021

2.1.0

May 7, 2021

This version

2.0.8

May 4, 2021

2.0.7

May 4, 2021

2.0.6

Feb 7, 2021

2.0.4

Nov 26, 2020

2.0.3

Nov 25, 2020

2.0.1

Jul 23, 2020

2.0.0

May 31, 2020

2.0.0a6 pre-release

Mar 30, 2020

2.0.0a5 pre-release

Mar 23, 2020

2.0.0a4 pre-release

Mar 20, 2020

2.0.0a3 pre-release

Mar 18, 2020

2.0.0a2 pre-release

Mar 16, 2020

2.0.0a1 pre-release

Mar 14, 2020

1.2

Sep 26, 2019

1.1

Sep 16, 2019

1.1b2 pre-release

Sep 14, 2019

1.1b1 pre-release

Sep 6, 2019

1.1a1 pre-release

Jul 5, 2019

1.0

Jul 4, 2019

1.0rc2 pre-release

Jul 3, 2019

1.0rc1 pre-release

Jun 29, 2019

0.96

Jun 12, 2019

0.95

Jun 8, 2019

0.94

Feb 16, 2019

0.93

May 2, 2018

0.92

Apr 30, 2018

0.91

Apr 29, 2018

0.90

Mar 29, 2018

0.82

Mar 29, 2018

0.81

Mar 29, 2018

0.80

Mar 28, 2018

0.28

Mar 26, 2018

0.27

Mar 25, 2018

0.25

Mar 25, 2018

0.24

Mar 25, 2018

0.23

May 25, 2014

0.22

May 6, 2014

0.2

Apr 27, 2014

0.1

Apr 26, 2014

0.01

Apr 22, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pykakasi-2.0.8.tar.gz (1.1 MB view details)

Uploaded May 4, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pykakasi-2.0.8-py3-none-any.whl (2.4 MB view details)

Uploaded May 4, 2021 Python 3

File details

Details for the file pykakasi-2.0.8.tar.gz.

File metadata

Download URL: pykakasi-2.0.8.tar.gz
Upload date: May 4, 2021
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for pykakasi-2.0.8.tar.gz
Algorithm	Hash digest
SHA256	`6b73c15bb87e786e20c121b0b9e6b28e390ccba2cb7ab4bfaf8aa301f5623e23`
MD5	`1d3be209954c8517499f62070946b8ae`
BLAKE2b-256	`17105cb954a44d0046bc841f8402e86132ce932dedf320188c9706a4fd3e3283`

See more details on using hashes here.

File details

Details for the file pykakasi-2.0.8-py3-none-any.whl.

File metadata

Download URL: pykakasi-2.0.8-py3-none-any.whl
Upload date: May 4, 2021
Size: 2.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for pykakasi-2.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a6a522c6dacf18e5590cfe3d3f0d1930354328ae8179aab5fff0c5334b101128`
MD5	`6ff0f7a0a44ab76f70350f4292da7198`
BLAKE2b-256	`8b6080791407bc1d21fcbf94f8eec4f4e095e3c312a2cf6e1c68e0a0ff6c36a8`

See more details on using hashes here.

pykakasi 2.0.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Pykakasi

Overview

Supported python versions

Usage

Old API

Copyright and License

PyKakasi ChangeLog

Unreleased

Added

Changed

Fixed

Deprecated

Removed

Security

v2.0.8 (4, May 2021)

Added

Changed

v2.0.7 (26, Feb. 2021)

v2.0.6 (7, Feb. 2021)

v2.0.5 (5, Feb. 2021)

Changed

Fixed

Deprecated

v2.0.4 (26, Nov. 2020)

v2.0.3 (25, Nov. 2020)

v2.0.2 (23, Jul. 2020)

v2.0.1 (23, Jul. 2020)

Changed

Fixed

v2.0.0 (31, May. 2020)

v2.0.0b1 (9, May. 2020)

v2.0.0a6 (30, Mar. 2020)

Added

Fixed

v2.0.0a5 (23, Mar. 2020)

v2.0.0a4 (20, Mar. 2020)

Added

Changed

Fixed

v2.0.0a3 (18, Mar. 2020)

Changed

Fixed

v2.0.0a2 (16, Mar. 2020)

Added

Changed

Fixed

v2.0.0a1 (14, Mar. 2020)

Added

Changed

Fixed

Deprecated

v1.2 (26, Sep, 2019)

v1.1 (16, Sep, 2019)

v1.1b2 (14, Sep, 2019)

v1.1b1 (6, Sep, 2019)

Added

Changed

v1.1a1 (8, Jul, 2019)

Changed

Fixed

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution