The joyokanji converts old-form kanji characters into new-form kanji characters.
Project description
joyo-kanji
joyokanji is a tiny, fast Python library that converts old-form kanji (Japanese: kyūjitai, 舊字/旧字) to new-form kanji (shinjitai, 新字) using a mapping grounded in the Agency for Cultural Affairs’ Jōyō Kanji list (常用漢字表). See the source list (Japanese) published by the Government of Japan: https://www.bunka.go.jp/kokugo_nihongo/sisaku/joho/joho/kijun/naikaku/kanji/.
Optionally, you can also normalize common variant glyphs used in personal names (e.g., 髙/𠮷/﨑/隆) by enabling variants=True.
What’s kyūjitai vs. shinjitai? After WWII, Japan simplified the shapes of many commonly used kanji. The older shapes are kyūjitai (e.g., 鹽 → 塩, 國 → 国, 體 → 体), and the simplified shapes are shinjitai. This library helps normalize text by replacing old forms with their modern counterparts.
Table of Contents
- Features
- Installation
- Quick Start
- Variants (optional)
- How It Works
- Examples
- Scope & Limitations
- Data Source & Attribution
- Performance Notes
- When to Use / Not to Use
- Contributing
- License
Features
- converts old-form (kyūjitai) kanji to modern (shinjitai) forms.
- Mapping-based, deterministic behavior — no surprises.
- Fast single-pass conversion using
str.translate(linear time O(n)). - Loads mapping once from
joyokanji/config/kanji.jsonand caches it. - Optional: normalize common variant glyphs (e.g., 髙, 𠮷, 﨑, 隆, 羽, 練, …) by passing
variants=True(usesjoyokanji/config/variants.json). - Pure-Python, minimal footprint, easy to embed in pipelines.
Installation
pip install joyokanji
If your package name differs on PyPI, update the command above accordingly.
Quick Start
import joyokanji
text = "鹽と黃と黑と點と發"
print(joyokanji.convert(text)) # => 塩と黄と黒と点と発
# Optional: include common variant glyphs
text2 = "髙﨑𠮷野屋"
print(joyokanji.convert(text2, variants=True)) # => 高崎吉野屋
API:
joyokanji.convert(text: str, variants: bool = False) -> str
How It Works
- On first use, the library loads a JSON dictionary (
joyokanji/config/kanji.json) of old→new pairs (e.g.,{"鹽": "塩"}) and builds a translation table withstr.maketrans. - Conversion is then a single pass over your string using
str.translate, which is both simple and efficient. - The table is cached in memory for subsequent calls.
- When
variants=True, an additional map fromjoyokanji/config/variants.jsonis merged in (variant entries take precedence on conflicts). A separate cached table is maintained for this mode.
Examples
Input → Output:
| Kyūjitai | Shinjitai |
|---|---|
| 鹽 | 塩 |
| 黃 | 黄 |
| 黑 | 黒 |
| 點 | 点 |
| 發 | 発 |
Only characters listed in the mapping are transformed; all others remain unchanged.
Variants (optional)
When variants=True, common variant glyphs (often seen in proper names) are also normalized. Examples:
| Variant | Normalized |
|---|---|
| 髙 | 高 |
| 𠮷 | 吉 |
| 﨑 | 崎 |
| 隆 | 隆 |
| 羽 | 羽 |
| 練 | 練 |
Scope & Limitations
- Coverage: The mapping focuses on characters relevant to modern Japanese usage and the Jōyō Kanji context. It is not a general Traditional ↔ Simplified Chinese converter and is not intended for zh-Hant texts (Taiwan/Hong Kong).
- Context-free: Conversion is character-to-character. The library does not inspect context, readings, or word boundaries.
- Proper nouns & personal names: Historical documents, proper nouns, and person names may intentionally use old forms (e.g., in legal names). Automatic conversion can be undesirable in such use cases. Review outputs when accuracy matters.
- For this reason, variant glyph normalization is OFF by default. Enable
variants=Trueonly when desired.
- For this reason, variant glyph normalization is OFF by default. Enable
- Normalization: The library does not perform Unicode normalization (e.g., NFKC) by itself. If you need it, run normalization before or after conversion according to your pipeline’s needs.
- Ambiguous variants: Some characters have multiple historical variants. The mapping chooses a widely accepted modern form; if you need domain-specific variants, consider customizing the mapping.
Data Source & Attribution
- Primary reference: the Jōyō Kanji list (常用漢字表) as published by Japan’s Agency for Cultural Affairs: https://www.bunka.go.jp/kokugo_nihongo/sisaku/joho/joho/kijun/naikaku/pdf/joyokanjihyo_20101130.pdf.
- The included mapping is derived from this reference and related historical simplifications. Any omissions or edge cases are welcomed as issues or PRs.
Performance Notes
- Building the translation table happens once per process. Subsequent calls are memory-only and very fast.
- The complexity is O(n) with low constant overhead, making it suitable for batch text processing.
When to Use / Not to Use
Use when: you need to normalize legacy texts into modern Japanese (OCR outputs, historical corpora, or mixed-form datasets).
Avoid or review carefully when: processing legal names, brand names, or scholarly editions where the original glyph choices carry meaning.
Contributing
- Issues and PRs are welcome, especially for: (1) mapping improvements, (2) tests covering edge cases, (3) documentation in English/Japanese.
- If proposing new pairs, please include a source/rationale and examples.
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file joyokanji-1.1.0.tar.gz.
File metadata
- Download URL: joyokanji-1.1.0.tar.gz
- Upload date:
- Size: 22.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
88091c7d3b2c43dc64c01e6762433c78aa928f2236048dffe943206490ca5c70
|
|
| MD5 |
76d313817a4ee0dad757b9517fba187e
|
|
| BLAKE2b-256 |
49aed5fc7c7bd643921bb049f7f425c65ad68bbea4fb1e2ea4e50cabe0ef8c7a
|
File details
Details for the file joyokanji-1.1.0-py3-none-any.whl.
File metadata
- Download URL: joyokanji-1.1.0-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
afccc1d3f37c588c79c728b3a1cbe8859996fcefa6e9eb0befed57b13f8550cd
|
|
| MD5 |
33a3ce5a672f9e05f93d306131d44989
|
|
| BLAKE2b-256 |
5ad74e1da851beee9d50082feb7cf376e70554708c3911bbacaf422d7bca9678
|