Conversion between Traditional and Simplified Chinese (pure Python)
Project description
opencc-py (OpenCC Pure Python Implementation)
This directory contains a pure Python implementation of the OpenCC Chinese conversion algorithm. It provides the same import surface as the Python package:
import opencc
converter = opencc.OpenCC("s2t")
print(converter.convert("汉字")) # 漢字
Data Dependency
The package does not bundle OpenCC configs or dictionaries directly. Built-in
conversion data is loaded from the
opencc-data PyPI package at runtime.
This keeps the pure Python package small and avoids depending on generated files under the OpenCC source tree. The converter reads:
- config JSON files from
opencc_data.config_path() - dictionary text files from
opencc_data.data_path() - test cases from
opencc_data.test_data_path()
Custom config files are still supported. When a custom config references a local
dictionary path such as CustomPhrases.ocd2, the pure Python implementation
looks for the corresponding CustomPhrases.txt next to the config file.
Installation
The PyPI package name is opencc-py. Users can install it with pip:
python -m pip install opencc-py
For local development from this directory:
python -m pip install .
The package version matches its opencc-data version and declares the matching
data package as an exact install dependency, so pip installs the compatible data
package automatically.
Or use editable development mode:
python -m pip install -e .
Supported Configs
opencc.CONFIGS is populated from the configs exposed by opencc-data.
import opencc
print(opencc.CONFIGS)
The standard mmseg configs and configs that do not require segmentation are
supported. Jieba plugin configs are not included in opencc-data, so they are
not exposed as built-in configs by this package.
Testing
Install test dependencies, then run pytest from the repository root:
python -m pip install -r python-pure/tests/requirements_lock.txt
PYTHONPATH=python-pure python -m pytest python-pure/tests
The tests verify:
- importing and initializing every built-in config
- conversion against
opencc-datatest cases - custom config and local dictionary resolution
- golden output compatibility for supported configs
Differences from the Official Implementation
This package intentionally implements only the pieces needed for pure Python
text conversion. Compared with the official C++ library and command-line tools,
it omits several lower-level details. The official Python implementation is the
opencc PyPI package.
- binary dictionary loading for
.ocd2/.ocd; built-in dictionaries are read from.txtdata supplied byopencc-data - dictionary compilation and extraction tools such as
opencc_dictandopencc_phrase_extract - the C API, shared-library loading behavior, and ABI/plugin compatibility guarantees
- native CLI behavior, including streaming I/O, command-line option parity, and platform-specific path handling
- package, runfiles, and source-tree data discovery fallbacks; built-in data
comes from
opencc-data - automatic loading of optional plugin configs or plugin resources, including the Jieba plugin package layout
- performance optimizations from marisa-trie, Darts, and the C++ segmentation implementation
The conversion semantics still mirror OpenCC's config-driven pipeline: mmseg segmentation, ordered dictionary groups, longest-prefix matching within a dictionary, conversion chains, normalization, and optional suppression of tofu-risk dictionaries.
License and Compliance
This package is distributed under the Apache License 2.0.
This project is a derivative work of
OpenCC. Runtime conversion data is provided
by the opencc-data PyPI package.
opencc-py (OpenCC 純 Python 實作)
此目錄包含 OpenCC 中文轉換演算法的純 Python 實作,提供與 Python package 相同的匯入介面:
import opencc
converter = opencc.OpenCC("s2t")
print(converter.convert("汉字")) # 漢字
資料依賴
此 package 不再直接內嵌 OpenCC config 或 dictionary。內建轉換資料會在執行時
從 PyPI package opencc-data 載入。
這能讓 pure Python package 保持精簡,並避免依賴 OpenCC source tree 底下的 生成檔案。converter 會讀取:
opencc_data.config_path()提供的 config JSON 檔案opencc_data.data_path()提供的 dictionary text 檔案opencc_data.test_data_path()提供的測試案例
自訂 config 仍然支援。當自訂 config 參照本地 dictionary 路徑,例如
CustomPhrases.ocd2,純 Python 實作會在 config 檔案旁尋找對應的
CustomPhrases.txt。
安裝
PyPI package 名稱是 opencc-py。使用者可以透過 pip 安裝:
python -m pip install opencc-py
從此目錄進行本地開發安裝:
python -m pip install .
此 package 的版本會與 opencc-data 版本一致,並將相同版本的資料 package
宣告為精確安裝依賴,因此 pip 會自動安裝相容的資料 package。
也可以使用 editable development mode:
python -m pip install -e .
支援的 Configs
opencc.CONFIGS 由 opencc-data 提供的 configs 產生。
import opencc
print(opencc.CONFIGS)
標準 mmseg configs 與不需要 segmentation 的 configs 皆受支援。Jieba plugin
configs 不包含在 opencc-data 中,因此此 package 不會把它們列為內建 configs。
測試
先安裝測試依賴,再從 repository root 執行 pytest:
python -m pip install -r python-pure/tests/requirements_lock.txt
PYTHONPATH=python-pure python -m pytest python-pure/tests
測試會驗證:
- 每個內建 config 都能 import 與初始化
- 轉換結果符合
opencc-data測試案例 - 自訂 config 與本地 dictionary 解析
- 支援 configs 的 golden output 相容性
與官方實作的差異
此 package 刻意只實作純 Python 文字轉換所需的部分。相較於官方 C++ library
與 command-line tools,它省略了幾個較底層的實作細節。官方 Python 實作是 PyPI
上的 opencc package。
.ocd2/.ocd二進位 dictionary 載入;內建 dictionary 會讀取opencc-data提供的.txt資料opencc_dict、opencc_phrase_extract等 dictionary 編譯與抽取工具- C API、shared-library 載入行為,以及 ABI/plugin 相容性保證
- native CLI 行為,包括 streaming I/O、命令列選項完整對齊,以及平台相關路徑處理
- package、runfiles、source-tree 資料搜尋 fallback;內建資料一律來自
opencc-data - optional plugin configs 或 plugin resources 的自動載入,包括 Jieba plugin 的 package layout
- marisa-trie、Darts 與 C++ segmentation 實作帶來的效能最佳化
轉換語意仍會對齊 OpenCC 的 config-driven pipeline:mmseg segmentation、 ordered dictionary groups、dictionary 內 longest-prefix matching、conversion chains、normalization,以及 tofu-risk dictionaries 的可選停用。
License 與合規
此 package 以 Apache License 2.0 發佈。
此專案屬於 OpenCC 的衍生作品。執行時轉換
資料由 PyPI package opencc-data
提供。
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file opencc_py-1.3.2.dev20260628.tar.gz.
File metadata
- Download URL: opencc_py-1.3.2.dev20260628.tar.gz
- Upload date:
- Size: 19.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba89a7b85f6e7b0d27efcab059929af3a6f25c752cad7ec0b846ee348926072c
|
|
| MD5 |
b6bc28b1168850185281daa03a888c74
|
|
| BLAKE2b-256 |
35f87d45620608f9f1bf7ce38e55e9797538f25126244da35697e661728b8d33
|
Provenance
The following attestation bundles were made for opencc_py-1.3.2.dev20260628.tar.gz:
Publisher:
release-pypi-pure.yml on frankslin/OpenCC
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
opencc_py-1.3.2.dev20260628.tar.gz -
Subject digest:
ba89a7b85f6e7b0d27efcab059929af3a6f25c752cad7ec0b846ee348926072c - Sigstore transparency entry: 1996816949
- Sigstore integration time:
-
Permalink:
frankslin/OpenCC@66f8656ecfd25a7be643e4619ffa4f10c219c682 -
Branch / Tag:
refs/heads/opencc-wasm-develop - Owner: https://github.com/frankslin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi-pure.yml@66f8656ecfd25a7be643e4619ffa4f10c219c682 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file opencc_py-1.3.2.dev20260628-py3-none-any.whl.
File metadata
- Download URL: opencc_py-1.3.2.dev20260628-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9db53aa5b0c02b5d2b56f706a8ed946d8948246b5d955a9329022b20eafd020
|
|
| MD5 |
d208a120d0e768a05b02ed9ef09c3124
|
|
| BLAKE2b-256 |
50221514a963d4ff71f779f243dbf50d92a270f3bafe5271af332624b51a3851
|
Provenance
The following attestation bundles were made for opencc_py-1.3.2.dev20260628-py3-none-any.whl:
Publisher:
release-pypi-pure.yml on frankslin/OpenCC
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
opencc_py-1.3.2.dev20260628-py3-none-any.whl -
Subject digest:
f9db53aa5b0c02b5d2b56f706a8ed946d8948246b5d955a9329022b20eafd020 - Sigstore transparency entry: 1996817055
- Sigstore integration time:
-
Permalink:
frankslin/OpenCC@66f8656ecfd25a7be643e4619ffa4f10c219c682 -
Branch / Tag:
refs/heads/opencc-wasm-develop - Owner: https://github.com/frankslin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi-pure.yml@66f8656ecfd25a7be643e4619ffa4f10c219c682 -
Trigger Event:
workflow_dispatch
-
Statement type: