Skip to main content

Pure Python implementation of OpenCC for Chinese text conversion

Project description

opencc_purepy

PyPI version License Downloads Build & Release

opencc_purepy is a pure Python implementation of OpenCC (Open Chinese Convert), supporting conversion between Simplified, Traditional, Hong Kong, Taiwan, and Japanese Kanji.
It uses dictionary-based segmentation and mapping logic inspired by the original OpenCC.


🚩 Features

  • Pure Python – no native dependencies
  • Multiple Chinese locale conversions (Simplified, Traditional, HK, TW, JP)
  • Punctuation style conversion (optional)
  • Automatic code detection (Simplified/Traditional)
  • CLI with Office document support (.docx, .xlsx, .pptx, .odt, .ods, .odp, .epub)

🐍 opencc_purepy requires Python 3.7 or later.


🔁 Supported Conversion Configs

Code Description
s2t Simplified → Traditional
t2s Traditional → Simplified
s2tw Simplified → Traditional (Taiwan)
tw2s Traditional (Taiwan) → Simplified
s2twp Simplified → Traditional (Taiwan) with idioms
tw2sp Traditional (Taiwan) → Simplified with idioms
s2hk Simplified → Traditional (Hong Kong)
hk2s Traditional (Hong Kong) → Simplified
t2tw Traditional → Traditional (Taiwan)
tw2t Traditional (Taiwan) → Traditional
t2twp Traditional → Traditional (Taiwan) with idioms
tw2tp Traditional (Taiwan) → Traditional with idioms
t2hk Traditional → Traditional (Hong Kong)
hk2t Traditional (Hong Kong) → Traditional
t2jp Japanese Kyujitai → Shinjitai
jp2t Japanese Shinjitai → Kyujitai

📦 Installation

pip install opencc-purepy

🚀 Usage

Python

from opencc_purepy import OpenCC

text = "“春眠不觉晓,处处闻啼鸟。”"
opencc = OpenCC("s2t")
converted = opencc.convert(text, punctuation=True)
print(converted)  # 「春眠不覺曉,處處聞啼鳥。」

CLI

Text File Conversion

python -m opencc_purepy convert -i input.txt -o output.txt -c s2t -p
# or, if installed as a script:
opencc-purepy convert -i input.txt -o output.txt -c s2t -p

Office Document Conversion subcommand (office)

Supports: .docx, .xlsx, .pptx, .odt, .ods, .odp, .epub

# Convert Word document with font preservation
opencc-purepy office -i example.docx -c t2s --keep-font

# Convert EPUB and auto-detect output name
opencc-purepy office -i book.epub -c s2t --auto-ext

# Convert Excel and specify output path and format
opencc-purepy office -i sheet.xlsx -o result.xlsx -c s2tw --format xlsx

ℹ️ With office subcommand, the input is processed as an Office or EPUB document and OpenCC conversion is applied internally.


🧩 API Reference

Exports

  • OpenCC
  • OpenccConfig

OpenCC class

  • OpenCC(config: str | OpenccConfig = "s2t")
    Create a converter with a supported config string or OpenccConfig enum value.
  • set_config(config: str | OpenccConfig) -> None
    Update the active conversion config.
  • get_config() -> str
    Return the current canonical config name.
  • supported_configs() -> list[str]
    Return all supported config names.
  • get_last_error() -> str | None
    Return the last validation or conversion error, if any.
  • convert(input: str, punctuation: bool = False) -> str
    Convert text using the active config, with optional punctuation conversion where supported.
  • s2t(input: str, punctuation: bool = False) -> str
    Simplified Chinese to Traditional Chinese.
  • t2s(input: str, punctuation: bool = False) -> str
    Traditional Chinese to Simplified Chinese.
  • s2tw(input: str, punctuation: bool = False) -> str
    Simplified Chinese to Taiwan Traditional.
  • tw2s(input: str, punctuation: bool = False) -> str
    Taiwan Traditional to Simplified Chinese.
  • s2twp(input: str, punctuation: bool = False) -> str
    Simplified Chinese to Taiwan Traditional with idiom and phrase conversion.
  • tw2sp(input: str, punctuation: bool = False) -> str
    Taiwan Traditional with idioms to Simplified Chinese.
  • s2hk(input: str, punctuation: bool = False) -> str
    Simplified Chinese to Hong Kong Traditional.
  • hk2s(input: str, punctuation: bool = False) -> str
    Hong Kong Traditional to Simplified Chinese.
  • t2tw(input: str) -> str
    Traditional Chinese to Taiwan Traditional.
  • t2twp(input: str) -> str
    Traditional Chinese to Taiwan Traditional with phrase mappings.
  • tw2t(input: str) -> str
    Taiwan Traditional to standard Traditional Chinese.
  • tw2tp(input: str) -> str
    Taiwan Traditional to standard Traditional Chinese with phrase reversal.
  • t2hk(input: str) -> str
    Traditional Chinese to Hong Kong variant.
  • hk2t(input: str) -> str
    Hong Kong Traditional to standard Traditional Chinese.
  • t2jp(input: str) -> str
    Traditional Chinese to Japanese variants.
  • jp2t(input: str) -> str
    Japanese Shinjitai to Traditional Chinese.
  • st(input: str) -> str
    Character-only Simplified to Traditional conversion.
  • ts(input: str) -> str
    Character-only Traditional to Simplified conversion.
  • zho_check(input: str) -> int
    Detect the input text type:
      1 - Traditional, 2 - Simplified, 0 - Others

OpenccConfig enum

  • Members include: S2T, T2S, S2TW, TW2S, S2TWP, TW2SP, S2HK, HK2S, T2TW, TW2T, T2TWP, TW2TP, T2HK, HK2T, T2JP, JP2T
  • to_canonical_name() -> str
    Return the lowercase OpenCC config string.
  • parse(value: str) -> OpenccConfig
    Parse a config string into an enum value.

🛠 Development


⚡ Benchmark

Measured on GitHub Actions ubuntu-latest using the default s2t configuration.
Each test averaged over 20 runs with the shared dictionary cache reused across runs.

Runner Platform

Field Value
Runner Linux X64
Image ubuntu24 20260413.86.1
Kernel Linux runnervmeorf1 6.17.0-1010-azure #10~24.04.1-Ubuntu SMP Fri Mar 6 22:00:57 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
CPU Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
CPU Cores 4
Memory Not reported
Python Python 3.10.20

Results

Input Size Avg. Time (ms)
100 chars 0.221 ms
1,000 chars 1.769 ms
10,000 chars 17.584 ms
100,000 chars 173.838 ms

Timings reuse the shared dictionary cache, but still include per-run OpenCC instance setup; results depend on runner hardware and background system load.


Projects That Use opencc-purepy

OpenccPurepyGui


📄 License

This project is licensed under the MIT License.


Powered by Pure Python and OpenCC Lexicons.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opencc_purepy-1.2.1.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opencc_purepy-1.2.1-py3-none-any.whl (1.0 MB view details)

Uploaded Python 3

File details

Details for the file opencc_purepy-1.2.1.tar.gz.

File metadata

  • Download URL: opencc_purepy-1.2.1.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for opencc_purepy-1.2.1.tar.gz
Algorithm Hash digest
SHA256 9d9de5fdfc499077e16f452fdc9f907e86acc87b3e61dff742211028d4fbf747
MD5 63a5f06d1fec87b590995982224b4a9d
BLAKE2b-256 adcf4c2e45ef955b0e16192d9d51567d220c36b83bd904022656a25ae5d616c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for opencc_purepy-1.2.1.tar.gz:

Publisher: release.yml on laisuk/opencc_purepy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file opencc_purepy-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: opencc_purepy-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for opencc_purepy-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9e336b61b6add4a1c27c329f433da9b456a55894b140cf29d05ff4867350c14a
MD5 900a7e1bb2610a071163f219bb58bccc
BLAKE2b-256 5af63b5ba83397168a81504de1566498f9b20f80d4f6d671a13bc05850c6de74

See more details on using hashes here.

Provenance

The following attestation bundles were made for opencc_purepy-1.2.1-py3-none-any.whl:

Publisher: release.yml on laisuk/opencc_purepy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page