Skip to main content

Kana kanji simple inversion library

Project description

Pykakasi

Overview

Documentation Status PyPI version Run Tox tests Azure-Pipelines Coverage status

pykakasi is a Python Natural Language Processing (NLP) library to transliterate hiragana, katakana and kanji (Japanese text) into rōmaji (Latin/Roman alphabet). It can handle characters in NFC form.

Its algorithms are based on the kakasi library, which is written in C.

Supported python versions

  • pykakasi supports python 3.6, 3.7, 3.8, 3.9, and pypy3

Usage

Transliterate Japanese text to kana, hiragana and romaji:

import pykakasi
kks = pykakasi.kakasi()
text = "かな漢字"
result = kks.convert(text)
for item in result:
    print("{}: kana '{}', hiragana '{}', romaji: '{}'".format(item['orig'], item['kana'], item['hira'], item['hepburn']))

かな: kana 'カナ', hiragana: 'かな', romaji: 'kana'
漢字: kana 'カンジ', hiragana: 'かんじ', romaji: 'kanji'

Here is an example that output as similar with furigana mode.

import pykakasi
kks = pykakasi.kakasi()
text = "かな漢字交じり文"
result = kks.convert(text)
for item in result:
    print("{}[{}] ".format(item['orig'], item['hepburn'].capitalize()), end='')
print()

かな[Kana] 漢字[Kanji] 交じり[Majiri] [Bun]

Benchmark result

You can see benchmark result on various versions and platforms at https://github.com/miurahr/pykakasi/issues/123

PyKakasi ChangeLog

All notable changes to this project will be documented in this file.

Unreleased

Added

  • backtrack matching mechanism(#132)

Changed

Fixed

* Add Zenkaku-Question(uFF1F) and other Zenkaku marks as endmark (#146)

Deprecated

Removed

Security

v2.2.0 (22, June 2021)

Added

  • dictionary: add noun and adjectives from UniDic(#140)

Changed

  • Refactoring main loop logics for convert()(#144)

Fixed

  • Fix segmentation (wakati) when combination with Katakana and Hiragana(#142)

v2.1.1 (16, May 2021)

Added

  • Provide Kakasi.normalize(text) class method

  • Add unidic data into data (not used yet), and add parse utility.

Fixed

  • Put type hint stub into package

  • Copyright notifications

Changed

  • Expand all cletter into dictionary (#139)

  • Change primary kanwadict index from str to int

  • test: gather all legacy test into test_pykakasi_legacy.py file.

v2.1.0 (6, May 2021)

Added

  • Deprecation warning when using old api(#124)

  • Add type hint file(pyi) (#124)

  • Benchmark test codes(#122)

Changed

  • Cache internal results and improve performance about 30-40 times.(#128)

  • Use standard pickle for database file(#128)

  • Exceptions module is now pykakasi, not pykakasi.exceptions

Removed

  • Dependency for klepto(#128)

v2.0.8 (4, May 2021)

Added

  • test: Benchmark and profiling (#122)

Changed

  • Performance: avoid ord() when checking long-mark, speed up about 6%

  • Reformat code by black(#121)

v2.0.7 (26, Feb. 2021)

Fixed

  • Infinite loop after running for a while, handle independent HW VOICED SOUND MARK (#115, #118)

v2.0.6 (7, Feb. 2021)

Fixed

  • Hiragana for Age countersa(#116,#117)

v2.0.5 (5, Feb. 2021)

Changed

  • CLI: use argparse for option parse(#113)

Fixed

  • Handle 思った、言った、行った properly.(#114)

  • CI: fix coveralls error

Deprecated

  • CI: drop travis-ci test and badge

v2.0.4 (26, Nov. 2020)

Fixed

  • CLI: Fix -v and -h option crash on python 3.7 and before (#108).

v2.0.3 (25, Nov. 2020)

Fixed

  • CLI: Fix -v and -h option crash (#108).

v2.0.2 (23, Jul. 2020)

Fixed

  • Fix convert() to handle Katakana correctly.(#103)

v2.0.1 (23, Jul. 2020)

Changed

  • Update setup.py, setup.cfg, tox.ini(#102)

Fixed

  • Fix convert() misses last part of a text (#99, #100)

  • Fix CI, coverage, and coveralls configurations(#101)

v2.0.0 (31, May. 2020)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pykakasi-2.3.0a1.tar.gz (21.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pykakasi-2.3.0a1-py3-none-any.whl (2.4 MB view details)

Uploaded Python 3

File details

Details for the file pykakasi-2.3.0a1.tar.gz.

File metadata

  • Download URL: pykakasi-2.3.0a1.tar.gz
  • Upload date:
  • Size: 21.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for pykakasi-2.3.0a1.tar.gz
Algorithm Hash digest
SHA256 077d679e45a8f7f8067a768b1f86792024fa9eecf64ba5f44660db64126e9c91
MD5 3c11978abe5404361dcd62f46abb5ab7
BLAKE2b-256 308f0067d8cc9bb9c831c8f2b10ff9f8a408001d40409a679da31f2600e92590

See more details on using hashes here.

File details

Details for the file pykakasi-2.3.0a1-py3-none-any.whl.

File metadata

  • Download URL: pykakasi-2.3.0a1-py3-none-any.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for pykakasi-2.3.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 05ff51d25ea5f04d77f79cc030fdb4f3f0cbe7600ad05247cc8fe8bb605459ad
MD5 306b2ade48b08d3d04db11de6e305adb
BLAKE2b-256 cbf009847b0d6e98748eec0f0eb4c3f9023f4fb1a6db001b8e9a9650bc93c975

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page