Skip to main content

Kanji Converter to Hiragana, Katakana, Roman alphabet

Project description

kanjiconv

Python License PyPI Downloads

Japanese REAMED is here. (日本語のREADMEはこちらです。)
https://github.com/sea-turt1e/kanjiconv/blob/main/README_ja.md

kanjiconv

Kanji Converter to Hiragana, Katakana, Roman alphabet.
You can get the reading and pronunciation of Japanese sentences based on sudachidict.
Sudachidict is a regularly updated dictionary, so it can relatively handle new proper nouns and other terms.

Environment

3.10 <= Python <= 3.13

Install

Install kanjiconv

pip install kanjiconv

If you want to use the UniDic dictionary with the use_unidic option, please download the unidic dictionary.

python -m unidic download

How to use

Import & Create Instance

from kanjiconv import KanjiConv

# Basic usage
kanji_conv = KanjiConv(separator="/")

# Using UniDic for improved kanji reading accuracy
kanji_conv = KanjiConv(separator="/", use_unidic=True)

# Using custom dictionary for kanji readings not covered by SudachiDict or UniDic
kanji_conv = KanjiConv(separator="/", use_custom_readings=True)

Get Reading

# convert to hiragana
text = "幽☆遊☆白書は、最高の漫画デス。"
print(kanji_conv.to_hiragana(text))
ゆうゆうはくしょ///さいこう//まんが/です/

# convert to katakana
text = "幽☆遊☆白書は、最高の漫画デス。"
print(kanji_conv.to_katakana(text))
ユウユウハクショ///サイコウ//マンガ/デス/

# convert to Roman alphabet
text = "幽☆遊☆白書は、最高の漫画デス。"
print(kanji_conv.to_roman(text))
yuuyuuhakusho/ha/, /saikou/no/manga/desu/. 

# You can change separator to another character or None
kanji_conv = KanjiConv(separator="_")
print(kanji_conv.to_hiragana(text))
ゆうゆうはくしょ_は__さいこう_の_まんが_です_

kanji_conv = KanjiConv(separator="")
print(kanji_conv.to_hiragana(text))
ゆうゆうはくしょはさいこうのまんがです

Using Custom Kanji Reading Dictionary

KanjiConv supports a custom dictionary for handling special kanji readings that are not properly recognized by SudachiDict or UniDic. This is particularly useful for:

  1. Special expressions with unique readings
  2. Technical terms or proper nouns
  3. Ambiguous kanji with multiple readings based on context

The custom dictionary is automatically loaded from the package if available, but you can also define your own:

from kanjiconv import KanjiConv

# Create instance with custom readings enabled (enabled by default)
kanji_conv = KanjiConv(separator="/", use_custom_readings=True)

# You can also define your own custom readings
kanji_conv.custom_readings = {
    "single": {
        "激": ["げき"],
        "飛": ["と", "ひ"]
    },
    "compound": {
        "激を飛ばす": "げきをとばす",
        "飛ばす": "とばす"
    }
}

# Now the special expression will be properly converted
print(kanji_conv.to_hiragana("激を飛ばす"))
# Output: げき/を/とばす

Custom Dictionary Structure

The custom dictionary uses the following format:

  • single: A dictionary mapping individual kanji to their reading(s)
    • Each kanji can have multiple readings as a list
    • The first reading in the list is used as default
  • compound: A dictionary mapping multi-character expressions to their reading
    • These are processed before tokenization and given priority

(Optional) Installing sudachidict other than the default

The default dictionary is sudachidict_full. If you want to use a lighter dictionary, you can install either sudachidict_small or sudachidict_core.

  • If you need detailed readings, we recommend using sudachidict_full. The default is set to sudachidict_full.
  • If you prefer lighter operation, sudachidict_small is recommended.
  • sudachidict_core offers a balanced option between speed and accuracy.
pip install sudachidict_small
pip install sudachidict_core
  • If using sudachidict_small or sudachidict_core, specify it like this:
kanji_conv = KanjiConv(sudachi_dict_type="small", separator="/")
kanji_conv = KanjiConv(sudachi_dict_type="core", separator="/")

Update Dict

kanjiconv reading function is based on SudachiDict, and you need to update SudachiDict regularly via pip.

pip install -U sudachidict_full
pip install -U sudachidict_small
pip install -U sudachidict_core

Local MCP Server

If you want to use kanjiconv as a local MCP Server, see kanjicon-mcp

License

This project is licensed under the Apache License 2.0.

Open Source Software Used

This library uses SudachiPy and its dictionary SudachiDict for morphological analysis. These are also distributed under the Apache License 2.0.

For detailed license information, please refer to the LICENSE files of each project:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kanjiconv-0.2.5.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kanjiconv-0.2.5-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file kanjiconv-0.2.5.tar.gz.

File metadata

  • Download URL: kanjiconv-0.2.5.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for kanjiconv-0.2.5.tar.gz
Algorithm Hash digest
SHA256 7f866063b91903ea70b63e56ee6d46c26ebdfa0ac3cc00792e99c84ce22c4aa4
MD5 a7c93008909b32b659cb0afe5e37e109
BLAKE2b-256 cfa89bdb4905d159967e69b497c8f57fff5e300d40f7e4a98a4d5b8744a7fa0a

See more details on using hashes here.

File details

Details for the file kanjiconv-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: kanjiconv-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for kanjiconv-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 308216f06fd3ded36a91a7a6e9332b2829937b00f6bd132abb009cbd1d599ca6
MD5 e6e54409e3b11c4f89bd3b0cc072005c
BLAKE2b-256 c19e65ab648506e7fae1bf6baf94ebc5df93a7e04e416679165b2dc32fe754b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page