Skip to main content

Python implementation of kakasi - kana kanji simple inversion library

Project description

Overview

Travis-CI PyPI version Coverage status Build status

pykakasi is re-implementation of kakasi library (original is written in C) in Python.

How To Use pykakasi

How to Install:

pip install six semidbm
pip install pykakasi

Building library, setup script build dictionary db file and generate pickled db files. Without dictionary files, a library fails to run.

Dependencies:

six and semidbm

Sample source code:

from pykakasi import kakasi,wakati

text = u"かな漢字交じり文"
kakasi = kakasi()
kakasi.setMode("H","a") # Hiragana to ascii, default: no conversion
kakasi.setMode("K","a") # Katakana to ascii, default: no conversion
kakasi.setMode("J","a") # Japanese to ascii, default: no conversion
kakasi.setMode("r","Hepburn") # default: use Hepburn Roman table
kakasi.setMode("s", True) # add space, default: no separator
kakasi.setMode("C", True) # capitalize, default: no capitalize
conv = kakasi.getConverter()
result = conv.do(text)
print(result)

wakati = wakati()
conv = wakati.getConverter()
result = conv.do(text)
print(result)

You can use output Mode values from “H”, “K”, “a” which is each means “Hiragana”, “Katakana” and “Alphabet”. For input, you can use “J” that means “Japanese” that is mixture of Kanji, Katakana and Hiragana. Also there is values of “H”, “K” that means “Hiragana”, and “Katakana”. You can use “Hepburn” , “Kunrei” or “Passport” as mode “r”, Roman table switch. Also “s” used for separator switch, “C” for capitalize switch. “S” for separator storing option.

wakati is an implementation of kakasi’s wakati gaki option.

Options

These switch alphabets are derived from original Kakasi. Now it support following options:

Option

Description

Values

Note

K

Katakana convertion

a,H,None

roman, Hiragana or noconversion

H

Hiragana convertion

a,K,None

roman, Katakana or noconversion

J

Kanji conversion

a,H,K,None

roman or Hiragana, Katakana or noconv

a

Roman conversion

E,None

JIS ROMAN or noconversion

E

JIS ROMAN conversion

a,None

ascii roman or noconversion

Each character means character sets as follows:

Character Sets
   a: ascii  j: jisroman  g: graphic  k: kana
   (j,k     defined in jisx0201)
   E: kigou  K: katakana  H: hiragana J: kanji
   (E,K,H,J defined in jisx0208)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pykakasi-0.90.tar.gz (1.0 MB view details)

Uploaded Source

File details

Details for the file pykakasi-0.90.tar.gz.

File metadata

  • Download URL: pykakasi-0.90.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pykakasi-0.90.tar.gz
Algorithm Hash digest
SHA256 2c6d40e1f5e3cd745d000c9af6b53a2ae4bfc945b3dc92ea99da37d68c42a0f5
MD5 9776bc9193e311261126930b835fb3f6
BLAKE2b-256 2424f920e918429bb1a2b6b9e48612e3c422747b03ed1f7996c6f30dc839acc1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page