Pure Python implementation of OpenCC for Chinese text conversion

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

opencc_purepy

opencc_purepy is a pure Python implementation of OpenCC (Open Chinese Convert), supporting conversion between Simplified, Traditional, Hong Kong, Taiwan, and Japanese Kanji.
It uses dictionary-based segmentation and mapping logic inspired by the original OpenCC.

🚩 Features

Pure Python – no native dependencies
Multiple Chinese locale conversions (Simplified, Traditional, HK, TW, JP)
Punctuation style conversion (optional)
Automatic code detection (Simplified/Traditional)
CLI with Office document support (.docx, .xlsx, .pptx, .odt, .ods, .odp, .epub)

🐍 opencc_purepy requires Python 3.7 or later.

🔁 Supported Conversion Configs

Code	Description
`s2t`	Simplified → Traditional
`t2s`	Traditional → Simplified
`s2tw`	Simplified → Traditional (Taiwan)
`tw2s`	Traditional (Taiwan) → Simplified
`s2twp`	Simplified → Traditional (Taiwan) with idioms
`tw2sp`	Traditional (Taiwan) → Simplified with idioms
`s2hk`	Simplified → Traditional (Hong Kong)
`hk2s`	Traditional (Hong Kong) → Simplified
`t2tw`	Traditional → Traditional (Taiwan)
`tw2t`	Traditional (Taiwan) → Traditional
`t2twp`	Traditional → Traditional (Taiwan) with idioms
`tw2tp`	Traditional (Taiwan) → Traditional with idioms
`t2hk`	Traditional → Traditional (Hong Kong)
`hk2t`	Traditional (Hong Kong) → Traditional
`t2jp`	Japanese Kyujitai → Shinjitai
`jp2t`	Japanese Shinjitai → Kyujitai

📦 Installation

pip install opencc-purepy

🚀 Usage

Python

from opencc_purepy import OpenCC

text = "“春眠不觉晓，处处闻啼鸟。”"
opencc = OpenCC("s2t")
converted = opencc.convert(text, punctuation=True)
print(converted)  # 「春眠不覺曉，處處聞啼鳥。」

CLI

Text File Conversion

python -m opencc_purepy convert -i input.txt -o output.txt -c s2t -p
# or, if installed as a script:
opencc-purepy convert -i input.txt -o output.txt -c s2t -p

Office Document Conversion subcommand (`office`)

Supports: .docx, .xlsx, .pptx, .odt, .ods, .odp, .epub

# Convert Word document with font preservation
opencc-purepy office -i example.docx -c t2s --keep-font

# Convert EPUB and auto-detect output name
opencc-purepy office -i book.epub -c s2t --auto-ext

# Convert Excel and specify output path and format
opencc-purepy office -i sheet.xlsx -o result.xlsx -c s2tw --format xlsx

ℹ️ With office subcommand, the input is processed as an Office or EPUB document and OpenCC conversion is applied internally.

📚 Custom Dictionaries

opencc_purepy follows the OpenCC lexicon structure. Custom entries are loaded through existing OpenCC dictionary slots, such as st_phrases or ts_phrases; do not use or document a generic UserDict.txt slot.

This keeps DictionaryMaxlength, DictRefs, and future acceleration structures such as UnionCache stable and OpenCC-compatible.

Append mode

Use appends={...} to load built-in dictionaries first, then custom entries. Duplicate keys use late-comer wins, so custom entries override built-in entries. This is recommended for most users.

from opencc_purepy import OpenCC

cc = OpenCC.from_dicts(
    config="s2t",
    appends={
        "st_phrases": "./UserDict.txt",
    },
)

print(cc.convert("帕兰蒂尔是一家公司"))

Override mode

Use overrides={...} to replace an entire dictionary slot with a custom file. This is intended for advanced users or proprietary full dictionary copies.

from opencc_purepy import OpenCC

cc = OpenCC.from_dicts(
    config="s2t",
    overrides={
        "st_phrases": "./company/STPhrases.txt",
    },
)

Direct dictionary injection

from opencc_purepy import OpenCC
from opencc_purepy.dictionary_lib import DictionaryMaxlength

dictionary = DictionaryMaxlength.from_dicts(
    appends={
        "st_phrases": "./UserDict.txt",
    },
)

cc = OpenCC(config="s2t", dictionary=dictionary)

Dictionary text format

Custom dictionary files are UTF-8 text files. Use one mapping per line in phrase<TAB>translation format. Blank lines are ignored, lines starting with # are comments, and duplicate keys are resolved by late-comer wins.

# Custom company terms
帕兰蒂尔	帕蘭蒂爾

Supported slots

Slot name	Default file
`st_characters`	`STCharacters.txt`
`st_phrases`	`STPhrases.txt`
`ts_characters`	`TSCharacters.txt`
`ts_phrases`	`TSPhrases.txt`
`tw_phrases`	`TWPhrases.txt`
`tw_phrases_rev`	`TWPhrasesRev.txt`
`tw_variants`	`TWVariants.txt`
`tw_variants_rev`	`TWVariantsRev.txt`
`tw_variants_rev_phrases`	`TWVariantsRevPhrases.txt`
`hk_variants`	`HKVariants.txt`
`hk_variants_rev`	`HKVariantsRev.txt`
`hk_variants_rev_phrases`	`HKVariantsRevPhrases.txt`
`jps_characters`	`JPShinjitaiCharacters.txt`
`jps_phrases`	`JPShinjitaiPhrases.txt`
`jp_variants`	`JPVariants.txt`
`jp_variants_rev`	`JPVariantsRev.txt`

Generate JSON with dictgen

TXT dictionaries are human-editable source files. dictionary_maxlength.json is a generated/cache format, so prefer dictgen instead of manually editing JSON.

opencc-purepy dictgen -d ./my_dicts -o dictionary_maxlength.json

from opencc_purepy import OpenCC
from opencc_purepy.dictionary_lib import DictionaryMaxlength

dictionary = DictionaryMaxlength.from_json("./dictionary_maxlength.json")
cc = OpenCC(config="s2t", dictionary=dictionary)

Which mode should I use?

Use appends for a few user or company terms.
Use overrides when maintaining a full proprietary replacement of an OpenCC dictionary file.
Use dictgen when you want to bake TXT dictionaries into JSON for reuse or faster loading.
Use direct dictionary injection when sharing one loaded dictionary across many OpenCC instances.

🧩 API Reference

Exports

OpenCC
OpenccConfig

`OpenCC` class

OpenCC(config: str | OpenccConfig = "s2t")
Create a converter with a supported config string or OpenccConfig enum value. Raises ValueError for unsupported configs.
set_config(config: str | OpenccConfig) -> None
Update the active conversion config. Raises ValueError for unsupported configs.
get_config() -> str
Return the current canonical config name.
supported_configs() -> list[str]
Return all supported config names.
get_last_error() -> str | None
Return the last validation or conversion error, if any.
convert(input: str, punctuation: bool = False) -> str
Convert text using the active config, with optional punctuation conversion.
s2t(input: str, punctuation: bool = False) -> str
Simplified Chinese to Traditional Chinese.
t2s(input: str, punctuation: bool = False) -> str
Traditional Chinese to Simplified Chinese.
s2tw(input: str, punctuation: bool = False) -> str
Simplified Chinese to Taiwan Traditional.
tw2s(input: str, punctuation: bool = False) -> str
Taiwan Traditional to Simplified Chinese.
s2twp(input: str, punctuation: bool = False) -> str
Simplified Chinese to Taiwan Traditional with idiom and phrase conversion.
tw2sp(input: str, punctuation: bool = False) -> str
Taiwan Traditional with idioms to Simplified Chinese.
s2hk(input: str, punctuation: bool = False) -> str
Simplified Chinese to Hong Kong Traditional.
hk2s(input: str, punctuation: bool = False) -> str
Hong Kong Traditional to Simplified Chinese.
t2tw(input: str, punctuation: bool = False) -> str
Traditional Chinese to Taiwan Traditional.
t2twp(input: str, punctuation: bool = False) -> str
Traditional Chinese to Taiwan Traditional with phrase mappings.
tw2t(input: str, punctuation: bool = False) -> str
Taiwan Traditional to standard Traditional Chinese.
tw2tp(input: str, punctuation: bool = False) -> str
Taiwan Traditional to standard Traditional Chinese with phrase reversal.
t2hk(input: str, punctuation: bool = False) -> str
Traditional Chinese to Hong Kong variant.
hk2t(input: str, punctuation: bool = False) -> str
Hong Kong Traditional to standard Traditional Chinese.
t2jp(input: str, punctuation: bool = False) -> str
Traditional Chinese to Japanese variants.
jp2t(input: str, punctuation: bool = False) -> str
Japanese Shinjitai to Traditional Chinese.
st(input: str) -> str
Character-only Simplified to Traditional conversion.
ts(input: str) -> str
Character-only Traditional to Simplified conversion.
zho_check(input: str) -> int
Detect the input text type:
1 - Traditional, 2 - Simplified, 0 - Others

`OpenccConfig` enum

Members include: S2T, T2S, S2TW, TW2S, S2TWP, TW2SP, S2HK, HK2S, T2TW, TW2T, T2TWP, TW2TP, T2HK, HK2T, T2JP, JP2T
to_canonical_name() -> str
Return the lowercase OpenCC config string.
parse(value: str) -> OpenccConfig
Parse a config string into an enum value.

🛠 Development

Python bindings: opencc_purepy/__init__.py, opencc_purepy/core.py
CLI: opencc_purepy/__main__.py

⚡ Benchmark

Measured on GitHub Actions ubuntu-latest using the default s2t configuration.
Each test averaged over 20 runs with the shared dictionary cache reused across runs.

Runner Platform

Field	Value
Runner	Linux X64
Image	ubuntu24 20260413.86.1
Kernel	`Linux runnervmeorf1 6.17.0-1010-azure #10~24.04.1-Ubuntu SMP Fri Mar 6 22:00:57 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux`
CPU	Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
CPU Cores	4
Memory	Not reported
Python	Python 3.10.20

Results

Input Size	Avg. Time (ms)
100 chars	0.221 ms
1,000 chars	1.769 ms
10,000 chars	17.584 ms
100,000 chars	173.838 ms

Timings reuse the shared dictionary cache, but still include per-run OpenCC instance setup; results depend on runner hardware and background system load.

Projects That Use `opencc-purepy`

OpenccPurepyGui

📄 License

This project is licensed under the MIT License.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

laisuk

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.3.1

May 23, 2026

1.3.0

May 23, 2026

This version

1.2.4

May 15, 2026

1.2.3

May 14, 2026

1.2.2

May 14, 2026

1.2.1

May 7, 2026

1.2.0

Apr 8, 2026

1.1.0

Aug 13, 2025

1.0.3

Jul 6, 2025

1.0.2

Jun 26, 2025

1.0.1

Jun 26, 2025

1.0.0

Jun 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opencc_purepy-1.2.4.tar.gz (1.0 MB view details)

Uploaded May 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

opencc_purepy-1.2.4-py3-none-any.whl (1.0 MB view details)

Uploaded May 15, 2026 Python 3

File details

Details for the file opencc_purepy-1.2.4.tar.gz.

File metadata

Download URL: opencc_purepy-1.2.4.tar.gz
Upload date: May 15, 2026
Size: 1.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for opencc_purepy-1.2.4.tar.gz
Algorithm	Hash digest
SHA256	`f1c112bbf694cfbdd76587625c94548640a117f00b037e06b3741500ab81bf02`
MD5	`6a5b1bfe0ae8ab756a1c704f84cb4733`
BLAKE2b-256	`1033b00ab2d460deb02654b4c7adf9940e10ac60f66486ce8200d9494217df6f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for opencc_purepy-1.2.4.tar.gz:

Publisher: release.yml on laisuk/opencc_purepy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: opencc_purepy-1.2.4.tar.gz
- Subject digest: f1c112bbf694cfbdd76587625c94548640a117f00b037e06b3741500ab81bf02
- Sigstore transparency entry: 1546506879
- Sigstore integration time: May 15, 2026
Source repository:
- Permalink: laisuk/opencc_purepy@316a504defca52cdd883cfc3fbc67f0a5ce536b7
- Branch / Tag: refs/tags/v1.2.4
- Owner: https://github.com/laisuk
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@316a504defca52cdd883cfc3fbc67f0a5ce536b7
- Trigger Event: push

File details

Details for the file opencc_purepy-1.2.4-py3-none-any.whl.

File metadata

Download URL: opencc_purepy-1.2.4-py3-none-any.whl
Upload date: May 15, 2026
Size: 1.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for opencc_purepy-1.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`10ee7bda16b01ca6cdd94c39f7d5450b2c1402594c46f1511c84738558913add`
MD5	`665ab8cdf5004c07bac0a8e62d1aaf86`
BLAKE2b-256	`560a4ccbe307fcbe2341960458a67ae6cd07b0d480ce636a3c229faba62960be`

See more details on using hashes here.

Provenance

The following attestation bundles were made for opencc_purepy-1.2.4-py3-none-any.whl:

Publisher: release.yml on laisuk/opencc_purepy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: opencc_purepy-1.2.4-py3-none-any.whl
- Subject digest: 10ee7bda16b01ca6cdd94c39f7d5450b2c1402594c46f1511c84738558913add
- Sigstore transparency entry: 1546506886
- Sigstore integration time: May 15, 2026
Source repository:
- Permalink: laisuk/opencc_purepy@316a504defca52cdd883cfc3fbc67f0a5ce536b7
- Branch / Tag: refs/tags/v1.2.4
- Owner: https://github.com/laisuk
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@316a504defca52cdd883cfc3fbc67f0a5ce536b7
- Trigger Event: push

opencc-purepy 1.2.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

opencc_purepy

🚩 Features

🔁 Supported Conversion Configs

📦 Installation

🚀 Usage

Python

CLI

Text File Conversion

Office Document Conversion subcommand (office)

📚 Custom Dictionaries

Append mode

Override mode

Direct dictionary injection

Dictionary text format

Supported slots

Generate JSON with dictgen

Which mode should I use?

🧩 API Reference

Exports

OpenCC class

OpenccConfig enum

🛠 Development

⚡ Benchmark

Runner Platform

Results

Projects That Use opencc-purepy

📄 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Office Document Conversion subcommand (`office`)

`OpenCC` class

`OpenccConfig` enum

Projects That Use `opencc-purepy`