Skip to main content

The joyokanji converts old-form kanji characters into new-form kanji characters.

Project description

joyo-kanji

日本語 / English

joyokanji is a tiny, fast Python library that converts old-form kanji (Japanese: kyūjitai, 舊字/旧字) to new-form kanji (shinjitai, 新字) using a mapping grounded in the Agency for Cultural Affairs’ Jōyō Kanji list (常用漢字表). See the source list (Japanese) published by the Government of Japan: https://www.bunka.go.jp/kokugo_nihongo/sisaku/joho/joho/kijun/naikaku/kanji/.

What’s kyūjitai vs. shinjitai? After WWII, Japan simplified the shapes of many commonly used kanji. The older shapes are kyūjitai (e.g., 鹽 → 塩, 國 → 国, 體 → 体), and the simplified shapes are shinjitai. This library helps normalize text by replacing old forms with their modern counterparts.


Table of Contents


Features

  • converts old-form (kyūjitai) kanji to modern (shinjitai) forms.
  • Mapping-based, deterministic behavior — no surprises.
  • Fast single-pass conversion using str.translate (linear time O(n)).
  • Loads mapping once from joyokanji/config/kanji.json and caches it.
  • Pure-Python, minimal footprint, easy to embed in pipelines.

Installation

pip install joyokanji

If your package name differs on PyPI, update the command above accordingly.

Quick Start

import joyokanji

text = "鹽と黃と黑と點と發"
print(joyokanji.convert(text))  # => 塩と黄と黒と点と発

How It Works

  • On first use, the library loads a JSON dictionary (joyokanji/config/kanji.json) of old→new pairs (e.g., {"鹽": "塩"}) and builds a translation table with str.maketrans.
  • Conversion is then a single pass over your string using str.translate, which is both simple and efficient.
  • The table is cached in memory for subsequent calls.

Examples

Input → Output:

Kyūjitai Shinjitai

Only characters listed in the mapping are transformed; all others remain unchanged.

Scope & Limitations

  • Coverage: The mapping focuses on characters relevant to modern Japanese usage and the Jōyō Kanji context. It is not a general Traditional ↔ Simplified Chinese converter and is not intended for zh-Hant texts (Taiwan/Hong Kong).
  • Context-free: Conversion is character-to-character. The library does not inspect context, readings, or word boundaries.
  • Proper nouns & personal names: Historical documents, proper nouns, and person names may intentionally use old forms (e.g., in legal names). Automatic conversion can be undesirable in such use cases. Review outputs when accuracy matters.
  • Normalization: The library does not perform Unicode normalization (e.g., NFKC) by itself. If you need it, run normalization before or after conversion according to your pipeline’s needs.
  • Ambiguous variants: Some characters have multiple historical variants. The mapping chooses a widely accepted modern form; if you need domain-specific variants, consider customizing the mapping.

Data Source & Attribution

Performance Notes

  • Building the translation table happens once per process. Subsequent calls are memory-only and very fast.
  • The complexity is O(n) with low constant overhead, making it suitable for batch text processing.

When to Use / Not to Use

Use when: you need to normalize legacy texts into modern Japanese (OCR outputs, historical corpora, or mixed-form datasets).

Avoid or review carefully when: processing legal names, brand names, or scholarly editions where the original glyph choices carry meaning.

Contributing

  • Issues and PRs are welcome, especially for: (1) mapping improvements, (2) tests covering edge cases, (3) documentation in English/Japanese.
  • If proposing new pairs, please include a source/rationale and examples.

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

joyokanji-1.0.0.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

joyokanji-1.0.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file joyokanji-1.0.0.tar.gz.

File metadata

  • Download URL: joyokanji-1.0.0.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for joyokanji-1.0.0.tar.gz
Algorithm Hash digest
SHA256 822cc6cdd6c8b90f7632e789131cfc2963648ef152e560af929f358ef4eaecbf
MD5 3d7004fc721b329c0e30559d486965d3
BLAKE2b-256 9409485a29434886363c821fc06283d3bc4e5fecf364f6e4ca052f2eb7e184a7

See more details on using hashes here.

File details

Details for the file joyokanji-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: joyokanji-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for joyokanji-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d4f05fb9e38e541537b841e76b6272eeb66d88b833deb043c297d2a8e90be837
MD5 e7f7c84c78233b12c23001f0a6d9505e
BLAKE2b-256 6086937ba5762f4f16e4cff723f68930031fe14bb5b6503d332bb11adea6c183

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page