Pure Python implementation of OpenCC for Chinese text conversion
Project description
opencc_purepy
opencc_purepy is a pure Python implementation
of OpenCC (Open Chinese Convert),
supporting conversion between Simplified, Traditional, Hong Kong, Taiwan, and Japanese Kanji.
It uses dictionary-based segmentation and mapping logic inspired by the original OpenCC.
🚩 Features
- Pure Python – no native dependencies
- Multiple Chinese locale conversions (Simplified, Traditional, HK, TW, JP)
- Punctuation style conversion (optional)
- Automatic code detection (Simplified/Traditional)
- CLI with Office document support (
.docx,.xlsx,.pptx,.odt,.ods,.odp,.epub)
🐍
opencc_purepyrequires Python 3.7 or later.
🔁 Supported Conversion Configs
| Code | Description |
|---|---|
s2t |
Simplified → Traditional |
t2s |
Traditional → Simplified |
s2tw |
Simplified → Traditional (Taiwan) |
tw2s |
Traditional (Taiwan) → Simplified |
s2twp |
Simplified → Traditional (Taiwan) with idioms |
tw2sp |
Traditional (Taiwan) → Simplified with idioms |
s2hk |
Simplified → Traditional (Hong Kong) |
hk2s |
Traditional (Hong Kong) → Simplified |
t2tw |
Traditional → Traditional (Taiwan) |
tw2t |
Traditional (Taiwan) → Traditional |
t2twp |
Traditional → Traditional (Taiwan) with idioms |
tw2tp |
Traditional (Taiwan) → Traditional with idioms |
t2hk |
Traditional → Traditional (Hong Kong) |
hk2t |
Traditional (Hong Kong) → Traditional |
t2jp |
Japanese Kyujitai → Shinjitai |
jp2t |
Japanese Shinjitai → Kyujitai |
📦 Installation
pip install opencc-purepy
🚀 Usage
Python
from opencc_purepy import OpenCC
text = "“春眠不觉晓,处处闻啼鸟。”"
opencc = OpenCC("s2t")
converted = opencc.convert(text, punctuation=True)
print(converted) # 「春眠不覺曉,處處聞啼鳥。」
CLI
Text File Conversion
python -m opencc_purepy convert -i input.txt -o output.txt -c s2t -p
# or, if installed as a script:
opencc-purepy convert -i input.txt -o output.txt -c s2t -p
Office Document Conversion subcommand (office)
Supports: .docx, .xlsx, .pptx, .odt, .ods, .odp, .epub
# Convert Word document with font preservation
opencc-purepy office -i example.docx -c t2s --keep-font
# Convert EPUB and auto-detect output name
opencc-purepy office -i book.epub -c s2t --auto-ext
# Convert Excel and specify output path and format
opencc-purepy office -i sheet.xlsx -o result.xlsx -c s2tw --format xlsx
ℹ️ With
officesubcommand, the input is processed as an Office or EPUB document and OpenCC conversion is applied internally.
🧩 API Reference
Exports
OpenCCOpenccConfig
OpenCC class
OpenCC(config: str | OpenccConfig = "s2t")
Create a converter with a supported config string orOpenccConfigenum value.set_config(config: str | OpenccConfig) -> None
Update the active conversion config.get_config() -> str
Return the current canonical config name.supported_configs() -> list[str]
Return all supported config names.get_last_error() -> str | None
Return the last validation or conversion error, if any.convert(input: str, punctuation: bool = False) -> str
Convert text using the active config, with optional punctuation conversion where supported.s2t(input: str, punctuation: bool = False) -> str
Simplified Chinese to Traditional Chinese.t2s(input: str, punctuation: bool = False) -> str
Traditional Chinese to Simplified Chinese.s2tw(input: str, punctuation: bool = False) -> str
Simplified Chinese to Taiwan Traditional.tw2s(input: str, punctuation: bool = False) -> str
Taiwan Traditional to Simplified Chinese.s2twp(input: str, punctuation: bool = False) -> str
Simplified Chinese to Taiwan Traditional with idiom and phrase conversion.tw2sp(input: str, punctuation: bool = False) -> str
Taiwan Traditional with idioms to Simplified Chinese.s2hk(input: str, punctuation: bool = False) -> str
Simplified Chinese to Hong Kong Traditional.hk2s(input: str, punctuation: bool = False) -> str
Hong Kong Traditional to Simplified Chinese.t2tw(input: str) -> str
Traditional Chinese to Taiwan Traditional.t2twp(input: str) -> str
Traditional Chinese to Taiwan Traditional with phrase mappings.tw2t(input: str) -> str
Taiwan Traditional to standard Traditional Chinese.tw2tp(input: str) -> str
Taiwan Traditional to standard Traditional Chinese with phrase reversal.t2hk(input: str) -> str
Traditional Chinese to Hong Kong variant.hk2t(input: str) -> str
Hong Kong Traditional to standard Traditional Chinese.t2jp(input: str) -> str
Traditional Chinese to Japanese variants.jp2t(input: str) -> str
Japanese Shinjitai to Traditional Chinese.st(input: str) -> str
Character-only Simplified to Traditional conversion.ts(input: str) -> str
Character-only Traditional to Simplified conversion.zho_check(input: str) -> int
Detect the input text type:
1- Traditional,2- Simplified,0- Others
OpenccConfig enum
- Members include:
S2T,T2S,S2TW,TW2S,S2TWP,TW2SP,S2HK,HK2S,T2TW,TW2T,T2TWP,TW2TP,T2HK,HK2T,T2JP,JP2T to_canonical_name() -> str
Return the lowercase OpenCC config string.parse(value: str) -> OpenccConfig
Parse a config string into an enum value.
🛠 Development
- Python bindings:
opencc_purepy/__init__.py,opencc_purepy/core.py - CLI:
opencc_purepy/__main__.py
⚡ Benchmark
Measured on a local machine using the default "s2t" configuration.
Each test averaged over 20 runs with the shared dictionary cache reused across runs.
| Input Size | Avg. Time (ms) |
|---|---|
| 100 chars | 0.15 ms |
| 1,000 chars | 0.93 ms |
| 10,000 chars | 8.76 ms |
| 100,000 chars | 86.05 ms |
Timings reuse the shared dictionary cache, but still include per-run OpenCC instance setup; results depend on local
hardware and background system load.
Projects That Use opencc-purepy
📄 License
This project is licensed under the MIT License.
Powered by Pure Python and OpenCC Lexicons.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file opencc_purepy-1.2.0.tar.gz.
File metadata
- Download URL: opencc_purepy-1.2.0.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a6c0f79792bf76d1bf0d25fecf71c41c433ae5c180d885f1b5456aca3c08104
|
|
| MD5 |
dbc53723ac5cbf12e007e9181c1ef377
|
|
| BLAKE2b-256 |
5478239786fb88320d327532f3b05a8cb0ec079ca9430fd11cd8e02d71a05566
|
File details
Details for the file opencc_purepy-1.2.0-py3-none-any.whl.
File metadata
- Download URL: opencc_purepy-1.2.0-py3-none-any.whl
- Upload date:
- Size: 1.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2cdc7592301b643ab7fbb810ad8a57c91f4b8c36eb24f32304737505313c97c0
|
|
| MD5 |
205929be3092e962a733c065d0ac9b3a
|
|
| BLAKE2b-256 |
dc07556e8c193e5449adfd00511d8d09a687f636876af9c8ebb9bc1114dd8192
|