Skip to main content

A Python extension module powered by Rust and PyO3, providing fast and accurate Chinese text conversion.

Project description

opencc_pyo3

PyPI version Downloads Python Versions License Build Status

opencc_pyo3 is a Python extension module powered by Rust and PyO3, providing fast and accurate conversion between different Chinese text variants using OpenCC algorithms.

Features

  • Convert between Simplified, Traditional, Hong Kong, Taiwan, and Japanese Kanji Chinese text.
  • Fast and memory-efficient, leveraging Rust's performance.
  • Easy-to-use Python API.
  • Supports punctuation conversion and automatic text code detection.

Supported Conversion Configurations

  • s2t, t2s, s2tw, tw2s, s2twp, tw2sp, s2hk, hk2s, t2tw, tw2t, t2twp, tw2tp, t2hk, hk2t, t2jp, jp2t

Installation

1. Install from PyPI

pip install opencc-pyo3

2. Build and install the Python wheel using maturin:

# In project root
maturin build --release
pip install ./target/wheels/opencc_pyo3-<version>-cp<pyver>-abi3-<platform>.whl

Or for development (May require venv):

maturin develop -r

See build.txt for detailed build and install instructions.

Usage

Python

from opencc_pyo3 import OpenCC

text = "“春眠不觉晓,处处闻啼鸟。”"
opencc = OpenCC("s2t")
converted = opencc.convert(text, punctuation=True)
print(converted)  # 「春眠不覺曉,處處聞啼鳥。」

CLI

You can also use the CLI interface via Python module or Python script:
Sub-Commands are:

  • convert: Convert Chinese text using OpenCC
  • office: Convert Office document Chinese text using OpenCC
  • pdf: Convert extracted PDF document text using OpenCC

convert

python -m opencc_pyo3 convert --help
usage: opencc-pyo3 convert [-h] [-i <file>] [-o <file>] [-c <conversion>] [-p] [--in-enc <encoding>] [--out-enc <encoding>]

options:
  -h, --help            show this help message and exit
  -i, --input <file>    Read original text from <file>.
  -o, --output <file>   Write converted text to <file>.
  -c, --config <conversion>
                        Conversion configuration: s2t|s2tw|s2twp|s2hk|t2s|tw2s|tw2sp|hk2s|jp2t|t2jp
  -p, --punct           Enable punctuation conversion. (Default: False)
  --in-enc <encoding>   Encoding for input. (Default: UTF-8)
  --out-enc <encoding>  Encoding for output. (Default: UTF-8)

office

Support OpenOffice documents and Epub (.docx, .xlsx, .pptx, .odt, .ods, .odp, .epub)

python -m opencc_pyo3 office --help                                         
usage: opencc-pyo3 office [-h] [-i <file>] [-o <file>] [-c <conversion>] [-p] [-f <format>] [--auto-ext] [--keep-font]

options:
  -h, --help            show this help message and exit
  -i, --input <file>    Input Office document from <file>.
  -o, --output <file>   Output Office document to <file>.
  -c, --config <conversion>
                        conversion: s2t|s2tw|s2twp|s2hk|t2s|tw2s|tw2sp|hk2s|jp2t|t2jp
  -p, --punct           Enable punctuation conversion. (Default: False)
  -f, --format <format>
                        Target Office format (e.g., docx, xlsx, pptx, odt, ods, odp, epub)
  --auto-ext            Auto-append extension to output file
  --keep-font           Preserve font-family information in Office content

PDF

Support PDF files as input, with built-in text extraction and OpenCC-based conversion powered by opencc-fmmseg (available since v0.8.4).

This command allows you to extract Chinese text from PDF documents, optionally apply CJK-aware paragraph reflow, and convert the result using OpenCC configurations.

Note
Only text-embedded (searchable) PDF documents are supported.
Scanned or image-only PDFs without an embedded text layer are not currently supported.

python -m opencc_pyo3 pdf --help

usage: __main__.py pdf [-h] -i <file> [-o <file>] [-c <conversion>] [-p] [-H] [-r] [--compact] [--timing] [-e]

options:
  -h, --help            show this help message and exit
  -i, --input <file>    Input PDF file.
  -o, --output <file>   Output text file (UTF-8). If omitted, defaults to "<input>_converted.txt".
  -c, --config <conversion>
                        Conversion configuration: s2t|s2tw|s2twp|s2hk|t2s|tw2s|tw2sp|hk2s|jp2t|t2jp
  -p, --punct           Enable punctuation conversion. (Default: False)
  -H, --header          Preserve page-break-like gaps when reflowing CJK paragraphs (passed as add_pdf_page_header to reflow_cjk_paragraphs).
  -r, --reflow          Enable CJK-aware paragraph reflow before conversion.
  --compact             Use compact paragraph mode (single newline between paragraphs).
  --timing              Show time use for each process workflow.
  -e, --extract         Extract PDF text only (skip OpenCC conversion).
python -m opencc_pyo3 convert -i input.txt -o output.txt -c s2t --punct

python -m opencc_pyo3 office -c s2t --punct -i input.docx -o output.docx --keep-font

opencc-pyo3 office -c s2tw -p -i input.epub -o output.epub

opencc-pyo3 pdf -i input.pdf -o output.txt -c s2t -punct --reflow

API

Class: OpenCC

  • OpenCC(config: str = "s2t")
    • config: Conversion configuration (see above).
  • set_config(config: str)
    • Set conversion config dynamically
  • convert(input: str, punctuation: bool = False) -> str
    • Convert text with optional punctuation conversion.
  • zho_check(input: str) -> int
    • Detects the code of the input text.
    • 1 - Traditional, 2 - Simplified, 0 - others

Development

Benchmarks

Package: opencc_pyo3
Python 3.13.5 (tags/v3.13.5:6cb20a2, Jun 11 2025, 16:15:46) [MSC v.1943 64 bit (AMD64)]
Platform: Windows-11-10.0.26100-SP0
Processor: Intel64 Family 6 Model 191 Stepping 2, GenuineIntel

BENCHMARK RESULTS


Method Config TextSize Mean StdDev Min Max Ops/sec Chars/sec
Convert_Small s2t 100 0.118 ms 0.097 ms 0.049 ms 0.811 ms 8,499 849,910
Convert_Medium s2t 1,000 0.250 ms 0.036 ms 0.211 ms 0.509 ms 4,004 4,003,531
Convert_Large s2t 10,000 0.845 ms 0.060 ms 0.775 ms 1.420 ms 1,184 11,835,419
Convert_XLarge s2t 100,000 4.755 ms 0.152 ms 4.515 ms 5.680 ms 210 21,030,543
Convert_Small s2tw 100 0.141 ms 0.027 ms 0.096 ms 0.321 ms 7,111 711,093
Convert_Medium s2tw 1,000 0.392 ms 0.030 ms 0.355 ms 0.623 ms 2,552 2,552,127
Convert_Large s2tw 10,000 1.271 ms 0.044 ms 1.191 ms 1.474 ms 787 7,869,452
Convert_XLarge s2tw 100,000 6.317 ms 0.139 ms 6.004 ms 7.250 ms 158 15,831,322
Convert_Small s2twp 100 0.204 ms 0.028 ms 0.132 ms 0.380 ms 4,911 491,118
Convert_Medium s2twp 1,000 0.598 ms 0.039 ms 0.527 ms 0.747 ms 1,671 1,671,296
Convert_Large s2twp 10,000 1.942 ms 0.061 ms 1.823 ms 2.223 ms 515 5,149,357
Convert_XLarge s2twp 100,000 9.937 ms 0.173 ms 9.542 ms 10.707 ms 101 10,063,174

Throughput vs Size

Throughput


Projects That Use opencc-pyo3

OpenccPyo3Gui


License

MIT


Powered by Rust, PyO3, OpenCC, Pdfium and opencc-fmmseg.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

opencc_pyo3-0.8.9-cp38-abi3-win_arm64.whl (4.4 MB view details)

Uploaded CPython 3.8+Windows ARM64

opencc_pyo3-0.8.9-cp38-abi3-win_amd64.whl (4.7 MB view details)

Uploaded CPython 3.8+Windows x86-64

opencc_pyo3-0.8.9-cp38-abi3-win32.whl (4.5 MB view details)

Uploaded CPython 3.8+Windows x86

opencc_pyo3-0.8.9-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

opencc_pyo3-0.8.9-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

opencc_pyo3-0.8.9-cp38-abi3-macosx_11_0_arm64.whl (4.4 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

opencc_pyo3-0.8.9-cp38-abi3-macosx_10_12_x86_64.whl (4.6 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file opencc_pyo3-0.8.9-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for opencc_pyo3-0.8.9-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 e98381e1c6559fcac0db20f6e3ad61e68c06f57e63ba482eb1f093583188dd4d
MD5 44a4c05677cb9aa85be9e3125b717457
BLAKE2b-256 f6f932c84279db3da2a870d7d337e20faf41c33be7f1691ba08c8bfa1eb5b07c

See more details on using hashes here.

File details

Details for the file opencc_pyo3-0.8.9-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for opencc_pyo3-0.8.9-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e6a38b3634fc8f800670b22b214f370a4ee95d788f7438741b1291300d5b7b0b
MD5 e710d03229f02c64106905ec9c60c374
BLAKE2b-256 b4d0e3036416026fe093cb2b9ca2446caf3faf4f7f8583ec5dc1563658800898

See more details on using hashes here.

File details

Details for the file opencc_pyo3-0.8.9-cp38-abi3-win32.whl.

File metadata

  • Download URL: opencc_pyo3-0.8.9-cp38-abi3-win32.whl
  • Upload date:
  • Size: 4.5 MB
  • Tags: CPython 3.8+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for opencc_pyo3-0.8.9-cp38-abi3-win32.whl
Algorithm Hash digest
SHA256 cc7f99e54efc9f957d4a2789c440f4169da345c3e049bfb984415dddc18faaa2
MD5 aa032bf1573e8221dc14246b49e500f9
BLAKE2b-256 11d13e3e6e1a5582ab346f9529b8e497f8d26c0250c80c37434cad3cf93fcf2c

See more details on using hashes here.

File details

Details for the file opencc_pyo3-0.8.9-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for opencc_pyo3-0.8.9-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9b9f5ad4d3de6415d313513a0466595660b9c9c45ab4bea9982f79cf68a5484c
MD5 f87fe8a80808d5ab830afc7d016c16bb
BLAKE2b-256 528a786e1ab62a7a03badc2e87bd013cad0978576d76c7fe3117c62ecbb8f500

See more details on using hashes here.

File details

Details for the file opencc_pyo3-0.8.9-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for opencc_pyo3-0.8.9-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7cadaa25da01086a04e582bcd9bd384fbfbaafbb0b418cb1784c8cc57a041dbc
MD5 e07ce72c8856b195c16951b5f7e20d41
BLAKE2b-256 6ff143cae208070f84101a3efed62bb5e280dc4437b3ad57d08850f55b4fa75d

See more details on using hashes here.

File details

Details for the file opencc_pyo3-0.8.9-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for opencc_pyo3-0.8.9-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 74c41b6309e7417c48de7993d8b43c18d19f848f73942bf3c39b801247ff8e7a
MD5 025673bd8d778c6febdd55a96d5372cf
BLAKE2b-256 011f2741d689a00eb8051565a3aa8bfa2a7b896b24b87b9e133c11d946a799c3

See more details on using hashes here.

File details

Details for the file opencc_pyo3-0.8.9-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for opencc_pyo3-0.8.9-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7a2d39a9ca776334956266871ea05b761f0b6fe1ebaf4431ea4f2eb6d0b40cd6
MD5 0bbc51839483310316bc5c566622ff9a
BLAKE2b-256 e5bacb8e9fe8ae5a7d7c62e51db54464d4d5b18387961cdca19b323a9fe5bd85

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page