Pure Python implementation of OpenCC for Chinese text conversion
Project description
opencc_purepy
opencc_purepy is a pure Python implementation
of OpenCC (Open Chinese Convert),
supporting conversion between Simplified, Traditional, Hong Kong, Taiwan, and Japanese Kanji.
It uses dictionary-based segmentation and mapping logic inspired by the original OpenCC.
🚩 Features
- Pure Python – no native dependencies
- Multiple Chinese locale conversions (Simplified, Traditional, HK, TW, JP)
- Punctuation style conversion (optional)
- Automatic code detection (Simplified/Traditional)
- CLI with Office document support (
.docx,.xlsx,.pptx,.odt,.ods,.odp,.epub)
🐍
opencc_purepyrequires Python 3.7 or later.
🔁 Supported Conversion Configs
| Code | Description |
|---|---|
s2t |
Simplified → Traditional |
t2s |
Traditional → Simplified |
s2tw |
Simplified → Traditional (Taiwan) |
tw2s |
Traditional (Taiwan) → Simplified |
s2twp |
Simplified → Traditional (Taiwan) with idioms |
tw2sp |
Traditional (Taiwan) → Simplified with idioms |
s2hk |
Simplified → Traditional (Hong Kong) |
hk2s |
Traditional (Hong Kong) → Simplified |
t2tw |
Traditional → Traditional (Taiwan) |
tw2t |
Traditional (Taiwan) → Traditional |
t2twp |
Traditional → Traditional (Taiwan) with idioms |
tw2tp |
Traditional (Taiwan) → Traditional with idioms |
t2hk |
Traditional → Traditional (Hong Kong) |
hk2t |
Traditional (Hong Kong) → Traditional |
t2jp |
Japanese Kyujitai → Shinjitai |
jp2t |
Japanese Shinjitai → Kyujitai |
📦 Installation
pip install opencc-purepy
🚀 Usage
Python
from opencc_purepy import OpenCC
text = "“春眠不觉晓,处处闻啼鸟。”"
opencc = OpenCC("s2t")
converted = opencc.convert(text, punctuation=True)
print(converted) # 「春眠不覺曉,處處聞啼鳥。」
CLI
Text File Conversion
python -m opencc_purepy convert -i input.txt -o output.txt -c s2t -p
# or, if installed as a script:
opencc-purepy convert -i input.txt -o output.txt -c s2t -p
Office Document Conversion subcommand (office)
Supports: .docx, .xlsx, .pptx, .odt, .ods, .odp, .epub
# Convert Word document with font preservation
opencc-purepy office -i example.docx -c t2s --keep-font
# Convert EPUB and auto-detect output name
opencc-purepy office -i book.epub -c s2t --auto-ext
# Convert Excel and specify output path and format
opencc-purepy office -i sheet.xlsx -o result.xlsx -c s2tw --format xlsx
ℹ️ With
officesubcommand, the input is processed as an Office or EPUB document and OpenCC conversion is applied internally.
🧩 API Reference
Exports
OpenCCOpenccConfig
OpenCC class
OpenCC(config: str | OpenccConfig = "s2t")
Create a converter with a supported config string orOpenccConfigenum value. RaisesValueErrorfor unsupported configs.set_config(config: str | OpenccConfig) -> None
Update the active conversion config. RaisesValueErrorfor unsupported configs.get_config() -> str
Return the current canonical config name.supported_configs() -> list[str]
Return all supported config names.get_last_error() -> str | None
Return the last validation or conversion error, if any.convert(input: str, punctuation: bool = False) -> str
Convert text using the active config, with optional punctuation conversion.s2t(input: str, punctuation: bool = False) -> str
Simplified Chinese to Traditional Chinese.t2s(input: str, punctuation: bool = False) -> str
Traditional Chinese to Simplified Chinese.s2tw(input: str, punctuation: bool = False) -> str
Simplified Chinese to Taiwan Traditional.tw2s(input: str, punctuation: bool = False) -> str
Taiwan Traditional to Simplified Chinese.s2twp(input: str, punctuation: bool = False) -> str
Simplified Chinese to Taiwan Traditional with idiom and phrase conversion.tw2sp(input: str, punctuation: bool = False) -> str
Taiwan Traditional with idioms to Simplified Chinese.s2hk(input: str, punctuation: bool = False) -> str
Simplified Chinese to Hong Kong Traditional.hk2s(input: str, punctuation: bool = False) -> str
Hong Kong Traditional to Simplified Chinese.t2tw(input: str, punctuation: bool = False) -> str
Traditional Chinese to Taiwan Traditional.t2twp(input: str, punctuation: bool = False) -> str
Traditional Chinese to Taiwan Traditional with phrase mappings.tw2t(input: str, punctuation: bool = False) -> str
Taiwan Traditional to standard Traditional Chinese.tw2tp(input: str, punctuation: bool = False) -> str
Taiwan Traditional to standard Traditional Chinese with phrase reversal.t2hk(input: str, punctuation: bool = False) -> str
Traditional Chinese to Hong Kong variant.hk2t(input: str, punctuation: bool = False) -> str
Hong Kong Traditional to standard Traditional Chinese.t2jp(input: str, punctuation: bool = False) -> str
Traditional Chinese to Japanese variants.jp2t(input: str, punctuation: bool = False) -> str
Japanese Shinjitai to Traditional Chinese.st(input: str) -> str
Character-only Simplified to Traditional conversion.ts(input: str) -> str
Character-only Traditional to Simplified conversion.zho_check(input: str) -> int
Detect the input text type:
1- Traditional,2- Simplified,0- Others
OpenccConfig enum
- Members include:
S2T,T2S,S2TW,TW2S,S2TWP,TW2SP,S2HK,HK2S,T2TW,TW2T,T2TWP,TW2TP,T2HK,HK2T,T2JP,JP2T to_canonical_name() -> str
Return the lowercase OpenCC config string.parse(value: str) -> OpenccConfig
Parse a config string into an enum value.
🛠 Development
- Python bindings:
opencc_purepy/__init__.py,opencc_purepy/core.py - CLI:
opencc_purepy/__main__.py
⚡ Benchmark
Measured on GitHub Actions
ubuntu-latestusing the defaults2tconfiguration.
Each test averaged over 20 runs with the shared dictionary cache reused across runs.
Runner Platform
| Field | Value |
|---|---|
| Runner | Linux X64 |
| Image | ubuntu24 20260413.86.1 |
| Kernel | Linux runnervmeorf1 6.17.0-1010-azure #10~24.04.1-Ubuntu SMP Fri Mar 6 22:00:57 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux |
| CPU | Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz |
| CPU Cores | 4 |
| Memory | Not reported |
| Python | Python 3.10.20 |
Results
| Input Size | Avg. Time (ms) |
|---|---|
| 100 chars | 0.221 ms |
| 1,000 chars | 1.769 ms |
| 10,000 chars | 17.584 ms |
| 100,000 chars | 173.838 ms |
Timings reuse the shared dictionary cache, but still include per-run OpenCC instance setup; results depend on runner
hardware and background system load.
Projects That Use opencc-purepy
📄 License
This project is licensed under the MIT License.
Powered by Pure Python and OpenCC Lexicons.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file opencc_purepy-1.2.2.tar.gz.
File metadata
- Download URL: opencc_purepy-1.2.2.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99a789fed0c816a91b8b905309e4351997f5518f69687e4e932ddf4cf6caf280
|
|
| MD5 |
7d98df4cb708f549f2f4944d1978eb49
|
|
| BLAKE2b-256 |
94a6c6029ed498a1f415819d7d6b385ed141a06edfa1c31e2f2644f6baec77e8
|
Provenance
The following attestation bundles were made for opencc_purepy-1.2.2.tar.gz:
Publisher:
release.yml on laisuk/opencc_purepy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
opencc_purepy-1.2.2.tar.gz -
Subject digest:
99a789fed0c816a91b8b905309e4351997f5518f69687e4e932ddf4cf6caf280 - Sigstore transparency entry: 1528357755
- Sigstore integration time:
-
Permalink:
laisuk/opencc_purepy@554052e296ecc5efab60531b60dd7f446296dcda -
Branch / Tag:
refs/tags/v1.2.2 - Owner: https://github.com/laisuk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@554052e296ecc5efab60531b60dd7f446296dcda -
Trigger Event:
push
-
Statement type:
File details
Details for the file opencc_purepy-1.2.2-py3-none-any.whl.
File metadata
- Download URL: opencc_purepy-1.2.2-py3-none-any.whl
- Upload date:
- Size: 1.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ded9915f24c57dd81814c0a571dd03095e653c53fb52f42a8b74aa9036a764d
|
|
| MD5 |
6ff4dc507207ce5d68667c2143ee6ed0
|
|
| BLAKE2b-256 |
b5aeef50c181f4542ef49b5445efc8243b105a276a903a2e31b2ed6cdd299df1
|
Provenance
The following attestation bundles were made for opencc_purepy-1.2.2-py3-none-any.whl:
Publisher:
release.yml on laisuk/opencc_purepy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
opencc_purepy-1.2.2-py3-none-any.whl -
Subject digest:
1ded9915f24c57dd81814c0a571dd03095e653c53fb52f42a8b74aa9036a764d - Sigstore transparency entry: 1528357967
- Sigstore integration time:
-
Permalink:
laisuk/opencc_purepy@554052e296ecc5efab60531b60dd7f446296dcda -
Branch / Tag:
refs/tags/v1.2.2 - Owner: https://github.com/laisuk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@554052e296ecc5efab60531b60dd7f446296dcda -
Trigger Event:
push
-
Statement type: