Skip to main content

Conversion between Traditional and Simplified Chinese (pure Python)

Project description

opencc-py (OpenCC Pure Python Implementation)

This directory contains a pure Python implementation of the OpenCC Chinese conversion algorithm. It provides the same import surface as the Python package:

import opencc

converter = opencc.OpenCC("s2t")
print(converter.convert("汉字"))  # 漢字

Data Dependency

The package does not bundle OpenCC configs or dictionaries directly. Built-in conversion data is loaded from the opencc-data PyPI package at runtime.

This keeps the pure Python package small and avoids depending on generated files under the OpenCC source tree. The converter reads:

  • config JSON files from opencc_data.config_path()
  • dictionary text files from opencc_data.data_path()
  • test cases from opencc_data.test_data_path()

Custom config files are still supported. When a custom config references a local dictionary path such as CustomPhrases.ocd2, the pure Python implementation looks for the corresponding CustomPhrases.txt next to the config file.

Installation

The PyPI package name is opencc-py. Users can install it with pip:

python -m pip install opencc-py

For local development from this directory:

python -m pip install .

The package version matches its opencc-data version and declares the matching data package as an exact install dependency, so pip installs the compatible data package automatically.

Or use editable development mode:

python -m pip install -e .

Supported Configs

opencc.CONFIGS is populated from the configs exposed by opencc-data.

import opencc

print(opencc.CONFIGS)

The standard mmseg configs and configs that do not require segmentation are supported. Jieba plugin configs are not included in opencc-data, so they are not exposed as built-in configs by this package.

Testing

Install test dependencies, then run pytest from the repository root:

python -m pip install -r python-pure/tests/requirements_lock.txt
PYTHONPATH=python-pure python -m pytest python-pure/tests

The tests verify:

  • importing and initializing every built-in config
  • conversion against opencc-data test cases
  • custom config and local dictionary resolution
  • golden output compatibility for supported configs

Differences from the Official Implementation

This package intentionally implements only the pieces needed for pure Python text conversion. Compared with the official C++ library and command-line tools, it omits several lower-level details. The official Python implementation is the opencc PyPI package.

  • binary dictionary loading for .ocd2/.ocd; built-in dictionaries are read from .txt data supplied by opencc-data
  • dictionary compilation and extraction tools such as opencc_dict and opencc_phrase_extract
  • the C API, shared-library loading behavior, and ABI/plugin compatibility guarantees
  • native CLI behavior, including streaming I/O, command-line option parity, and platform-specific path handling
  • package, runfiles, and source-tree data discovery fallbacks; built-in data comes from opencc-data
  • automatic loading of optional plugin configs or plugin resources, including the Jieba plugin package layout
  • performance optimizations from marisa-trie, Darts, and the C++ segmentation implementation

The conversion semantics still mirror OpenCC's config-driven pipeline: mmseg segmentation, ordered dictionary groups, longest-prefix matching within a dictionary, conversion chains, normalization, and optional suppression of tofu-risk dictionaries.

License and Compliance

This package is distributed under the Apache License 2.0.

This project is a derivative work of OpenCC. Runtime conversion data is provided by the opencc-data PyPI package.


opencc-py (OpenCC 純 Python 實作)

此目錄包含 OpenCC 中文轉換演算法的純 Python 實作,提供與 Python package 相同的匯入介面:

import opencc

converter = opencc.OpenCC("s2t")
print(converter.convert("汉字"))  # 漢字

資料依賴

此 package 不再直接內嵌 OpenCC config 或 dictionary。內建轉換資料會在執行時 從 PyPI package opencc-data 載入。

這能讓 pure Python package 保持精簡,並避免依賴 OpenCC source tree 底下的 生成檔案。converter 會讀取:

  • opencc_data.config_path() 提供的 config JSON 檔案
  • opencc_data.data_path() 提供的 dictionary text 檔案
  • opencc_data.test_data_path() 提供的測試案例

自訂 config 仍然支援。當自訂 config 參照本地 dictionary 路徑,例如 CustomPhrases.ocd2,純 Python 實作會在 config 檔案旁尋找對應的 CustomPhrases.txt

安裝

PyPI package 名稱是 opencc-py。使用者可以透過 pip 安裝:

python -m pip install opencc-py

從此目錄進行本地開發安裝:

python -m pip install .

此 package 的版本會與 opencc-data 版本一致,並將相同版本的資料 package 宣告為精確安裝依賴,因此 pip 會自動安裝相容的資料 package。

也可以使用 editable development mode:

python -m pip install -e .

支援的 Configs

opencc.CONFIGSopencc-data 提供的 configs 產生。

import opencc

print(opencc.CONFIGS)

標準 mmseg configs 與不需要 segmentation 的 configs 皆受支援。Jieba plugin configs 不包含在 opencc-data 中,因此此 package 不會把它們列為內建 configs。

測試

先安裝測試依賴,再從 repository root 執行 pytest:

python -m pip install -r python-pure/tests/requirements_lock.txt
PYTHONPATH=python-pure python -m pytest python-pure/tests

測試會驗證:

  • 每個內建 config 都能 import 與初始化
  • 轉換結果符合 opencc-data 測試案例
  • 自訂 config 與本地 dictionary 解析
  • 支援 configs 的 golden output 相容性

與官方實作的差異

此 package 刻意只實作純 Python 文字轉換所需的部分。相較於官方 C++ library 與 command-line tools,它省略了幾個較底層的實作細節。官方 Python 實作是 PyPI 上的 opencc package。

  • .ocd2 / .ocd 二進位 dictionary 載入;內建 dictionary 會讀取 opencc-data 提供的 .txt 資料
  • opencc_dictopencc_phrase_extract 等 dictionary 編譯與抽取工具
  • C API、shared-library 載入行為,以及 ABI/plugin 相容性保證
  • native CLI 行為,包括 streaming I/O、命令列選項完整對齊,以及平台相關路徑處理
  • package、runfiles、source-tree 資料搜尋 fallback;內建資料一律來自 opencc-data
  • optional plugin configs 或 plugin resources 的自動載入,包括 Jieba plugin 的 package layout
  • marisa-trie、Darts 與 C++ segmentation 實作帶來的效能最佳化

轉換語意仍會對齊 OpenCC 的 config-driven pipeline:mmseg segmentation、 ordered dictionary groups、dictionary 內 longest-prefix matching、conversion chains、normalization,以及 tofu-risk dictionaries 的可選停用。

License 與合規

此 package 以 Apache License 2.0 發佈。

此專案屬於 OpenCC 的衍生作品。執行時轉換 資料由 PyPI package opencc-data 提供。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opencc_py-1.3.2.dev20260628.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opencc_py-1.3.2.dev20260628-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file opencc_py-1.3.2.dev20260628.tar.gz.

File metadata

  • Download URL: opencc_py-1.3.2.dev20260628.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for opencc_py-1.3.2.dev20260628.tar.gz
Algorithm Hash digest
SHA256 ba89a7b85f6e7b0d27efcab059929af3a6f25c752cad7ec0b846ee348926072c
MD5 b6bc28b1168850185281daa03a888c74
BLAKE2b-256 35f87d45620608f9f1bf7ce38e55e9797538f25126244da35697e661728b8d33

See more details on using hashes here.

Provenance

The following attestation bundles were made for opencc_py-1.3.2.dev20260628.tar.gz:

Publisher: release-pypi-pure.yml on frankslin/OpenCC

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file opencc_py-1.3.2.dev20260628-py3-none-any.whl.

File metadata

File hashes

Hashes for opencc_py-1.3.2.dev20260628-py3-none-any.whl
Algorithm Hash digest
SHA256 f9db53aa5b0c02b5d2b56f706a8ed946d8948246b5d955a9329022b20eafd020
MD5 d208a120d0e768a05b02ed9ef09c3124
BLAKE2b-256 50221514a963d4ff71f779f243dbf50d92a270f3bafe5271af332624b51a3851

See more details on using hashes here.

Provenance

The following attestation bundles were made for opencc_py-1.3.2.dev20260628-py3-none-any.whl:

Publisher: release-pypi-pure.yml on frankslin/OpenCC

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page