Conversion between Traditional and Simplified Chinese (pure Python)

These details have not been verified by PyPI

Project links

Homepage

Project description

opencc-py (OpenCC Pure Python Implementation)

This directory contains a pure Python implementation of the OpenCC Chinese conversion algorithm. It provides the same import surface as the Python package:

import opencc

converter = opencc.OpenCC("s2t")
print(converter.convert("汉字"))  # 漢字

Data Dependency

The package does not bundle OpenCC configs or dictionaries directly. Built-in conversion data is loaded from the opencc-data PyPI package at runtime.

This keeps the pure Python package small and avoids depending on generated files under the OpenCC source tree. The converter reads:

config JSON files from opencc_data.config_path()
dictionary text files from opencc_data.data_path()
test cases from opencc_data.test_data_path()

Custom config files are still supported. When a custom config references a local dictionary path such as CustomPhrases.ocd2, the pure Python implementation looks for the corresponding CustomPhrases.txt next to the config file.

Installation

The PyPI package name is opencc-py. Users can install it with pip:

python -m pip install opencc-py

For local development from this directory:

python -m pip install .

The package version matches its opencc-data version and declares the matching data package as an exact install dependency, so pip installs the compatible data package automatically.

Or use editable development mode:

python -m pip install -e .

Supported Configs

opencc.CONFIGS is populated from the configs exposed by opencc-data.

import opencc

print(opencc.CONFIGS)

The standard mmseg configs and configs that do not require segmentation are supported. Jieba plugin configs are not included in opencc-data, so they are not exposed as built-in configs by this package.

Testing

Install test dependencies, then run pytest from the repository root:

python -m pip install -r python-pure/tests/requirements_lock.txt
PYTHONPATH=python-pure python -m pytest python-pure/tests

The tests verify:

importing and initializing every built-in config
conversion against opencc-data test cases
custom config and local dictionary resolution
golden output compatibility for supported configs

OpenCC 1.3.2 Feature Coverage

The following OpenCC 1.3.2 features are fully supported:

CJK Compatibility Ideographs normalization — all built-in configs include a pre-processing normalization step that maps U+F900–U+FAFF characters to their canonical code points before conversion.
match_policy: union — dictionary groups with "match_policy": "union" return the globally longest match across all sub-dictionaries.
normalization config field — custom configs may add a normalization array to apply conversion steps before segmentation.
New configs — s2hkp and hk2sp (Simplified ↔ Hong Kong, with phrase conversion) are available through opencc-data.
Tofu-risk dictionary suppression — pass include_tofu_risk_dictionaries=False to OpenCC() to exclude dictionaries that may produce characters absent from modern CJK fonts.
JSONC — config files may use // line comments and /* */ block comments; the pure Python backend strips them before JSON parsing.
Inline dictionaries — {"type": "inline", "entries": {"key": "value", ...}} dict nodes are supported in custom configs.

Differences from the Official Implementation

This package intentionally implements only the pieces needed for pure Python text conversion. Compared with the official C++ library and command-line tools, it omits several lower-level details. The official Python implementation is the opencc PyPI package.

binary dictionary loading for .ocd2/.ocd; built-in dictionaries are read from .txt data supplied by opencc-data
dictionary compilation and extraction tools such as opencc_dict and opencc_phrase_extract
the C API, shared-library loading behavior, and ABI/plugin compatibility guarantees
native CLI behavior, including streaming I/O, command-line option parity, and platform-specific path handling
package, runfiles, and source-tree data discovery fallbacks; built-in data comes from opencc-data
automatic loading of optional plugin configs or plugin resources, including the Jieba plugin package layout
performance optimizations from marisa-trie, Darts, and the C++ segmentation implementation

The conversion semantics still mirror OpenCC's config-driven pipeline: mmseg segmentation, ordered dictionary groups, longest-prefix matching within a dictionary, conversion chains, normalization, and optional suppression of tofu-risk dictionaries.

License and Compliance

This package is distributed under the Apache License 2.0.

This project is a derivative work of OpenCC. Runtime conversion data is provided by the opencc-data PyPI package.

opencc-py (OpenCC 純 Python 實作)

此目錄包含 OpenCC 中文轉換演算法的純 Python 實作，提供與 Python package 相同的匯入介面：

import opencc

converter = opencc.OpenCC("s2t")
print(converter.convert("汉字"))  # 漢字

資料依賴

此 package 不再直接內嵌 OpenCC config 或 dictionary。內建轉換資料會在執行時從 PyPI package opencc-data 載入。

這能讓 pure Python package 保持精簡，並避免依賴 OpenCC source tree 底下的生成檔案。converter 會讀取：

opencc_data.config_path() 提供的 config JSON 檔案
opencc_data.data_path() 提供的 dictionary text 檔案
opencc_data.test_data_path() 提供的測試案例

自訂 config 仍然支援。當自訂 config 參照本地 dictionary 路徑，例如 CustomPhrases.ocd2，純 Python 實作會在 config 檔案旁尋找對應的 CustomPhrases.txt。

安裝

PyPI package 名稱是 opencc-py。使用者可以透過 pip 安裝：

python -m pip install opencc-py

從此目錄進行本地開發安裝：

python -m pip install .

此 package 的版本會與 opencc-data 版本一致，並將相同版本的資料 package 宣告為精確安裝依賴，因此 pip 會自動安裝相容的資料 package。

也可以使用 editable development mode：

python -m pip install -e .

支援的 Configs

opencc.CONFIGS 由 opencc-data 提供的 configs 產生。

import opencc

print(opencc.CONFIGS)

標準 mmseg configs 與不需要 segmentation 的 configs 皆受支援。Jieba plugin configs 不包含在 opencc-data 中，因此此 package 不會把它們列為內建 configs。

測試

先安裝測試依賴，再從 repository root 執行 pytest：

python -m pip install -r python-pure/tests/requirements_lock.txt
PYTHONPATH=python-pure python -m pytest python-pure/tests

測試會驗證：

每個內建 config 都能 import 與初始化
轉換結果符合 opencc-data 測試案例
自訂 config 與本地 dictionary 解析
支援 configs 的 golden output 相容性

OpenCC 1.3.2 功能支援狀況

以下 OpenCC 1.3.2 功能已完整支援：

CJK 相容表意文字正規化 — 所有內建 config 均包含正規化前處理步驟，在轉換前先將 U+F900–U+FAFF 區塊字元映射至標準碼位。
match_policy: union — 使用 "match_policy": "union" 的 dictionary group 會取所有子 dictionary 中最長的前綴命中。
normalization config 欄位 — 自訂 config 可加入 normalization 陣列，在 segmentation 前插入正規化步驟。
新 configs — s2hkp 與 hk2sp（簡體 ↔ 香港繁體，含詞組轉換）透過 opencc-data 提供。
Tofu-risk dictionary 停用 — 建構 OpenCC() 時傳入 include_tofu_risk_dictionaries=False 可停用可能輸出現代字型缺字的 dictionary。
JSONC — config 檔案支援 // 行注釋與 /* */ 區塊注釋；純 Python 後端在解析 JSON 前會先剝除注釋。
Inline dictionary — 自訂 config 支援 {"type": "inline", "entries": {"key": "value", ...}} 節點。

與官方實作的差異

此 package 刻意只實作純 Python 文字轉換所需的部分。相較於官方 C++ library 與 command-line tools，它省略了幾個較底層的實作細節。官方 Python 實作是 PyPI 上的 opencc package。

.ocd2 / .ocd 二進位 dictionary 載入；內建 dictionary 會讀取 opencc-data 提供的 .txt 資料
opencc_dict、opencc_phrase_extract 等 dictionary 編譯與抽取工具
C API、shared-library 載入行為，以及 ABI/plugin 相容性保證
native CLI 行為，包括 streaming I/O、命令列選項完整對齊，以及平台相關路徑處理
package、runfiles、source-tree 資料搜尋 fallback；內建資料一律來自 opencc-data
optional plugin configs 或 plugin resources 的自動載入，包括 Jieba plugin 的 package layout
marisa-trie、Darts 與 C++ segmentation 實作帶來的效能最佳化

轉換語意仍會對齊 OpenCC 的 config-driven pipeline：mmseg segmentation、 ordered dictionary groups、dictionary 內 longest-prefix matching、conversion chains、normalization，以及 tofu-risk dictionaries 的可選停用。

License 與合規

此 package 以 Apache License 2.0 發佈。

此專案屬於 OpenCC 的衍生作品。執行時轉換資料由 PyPI package opencc-data 提供。

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.4.0

Jul 2, 2026

1.3.2.dev20260628 pre-release

Jun 28, 2026

1.1.0 yanked

May 11, 2020

Reason this release was yanked:

Use `opencc` package for the Python bindings for OpenCC (C++ library)

1.0.6 yanked

May 10, 2020

Reason this release was yanked:

Use `opencc` package for the Python bindings for OpenCC (C++ library)

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opencc_py-1.4.0.tar.gz (22.0 kB view details)

Uploaded Jul 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

opencc_py-1.4.0-py3-none-any.whl (15.9 kB view details)

Uploaded Jul 2, 2026 Python 3

File details

Details for the file opencc_py-1.4.0.tar.gz.

File metadata

Download URL: opencc_py-1.4.0.tar.gz
Upload date: Jul 2, 2026
Size: 22.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for opencc_py-1.4.0.tar.gz
Algorithm	Hash digest
SHA256	`651dc80a44fe4d4857590b82f751ad39b41a4ed0ba57e78159592ddbeb9f2d9a`
MD5	`6df6059ddb419c596a049ca064a2c003`
BLAKE2b-256	`0eb4f2942390ad6f8d3a14f70c4bd28584909cab9445071dc34b3fc8833a53fc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for opencc_py-1.4.0.tar.gz:

Publisher: release-pypi-pure.yml on frankslin/OpenCC

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: opencc_py-1.4.0.tar.gz
- Subject digest: 651dc80a44fe4d4857590b82f751ad39b41a4ed0ba57e78159592ddbeb9f2d9a
- Sigstore transparency entry: 2044487466
- Sigstore integration time: Jul 2, 2026
Source repository:
- Permalink: frankslin/OpenCC@54854a7d20f35b2aacf0db4ba54a85df1d81deed
- Branch / Tag: refs/heads/opencc-wasm-develop
- Owner: https://github.com/frankslin
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-pypi-pure.yml@54854a7d20f35b2aacf0db4ba54a85df1d81deed
- Trigger Event: workflow_dispatch

File details

Details for the file opencc_py-1.4.0-py3-none-any.whl.

File metadata

Download URL: opencc_py-1.4.0-py3-none-any.whl
Upload date: Jul 2, 2026
Size: 15.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for opencc_py-1.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7d5a7df41ad043a8faccffda331f3f1ddc3d38bee89f82f0d42d0bfdd7d48228`
MD5	`e81601339983c4ae33b10990ec5a4df4`
BLAKE2b-256	`8f7d6ffa705b8195e0b8adbdd72bc26a9a4a8cb20acc8b49290483fdf71abe09`

See more details on using hashes here.

Provenance

The following attestation bundles were made for opencc_py-1.4.0-py3-none-any.whl:

Publisher: release-pypi-pure.yml on frankslin/OpenCC

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: opencc_py-1.4.0-py3-none-any.whl
- Subject digest: 7d5a7df41ad043a8faccffda331f3f1ddc3d38bee89f82f0d42d0bfdd7d48228
- Sigstore transparency entry: 2044487509
- Sigstore integration time: Jul 2, 2026
Source repository:
- Permalink: frankslin/OpenCC@54854a7d20f35b2aacf0db4ba54a85df1d81deed
- Branch / Tag: refs/heads/opencc-wasm-develop
- Owner: https://github.com/frankslin
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-pypi-pure.yml@54854a7d20f35b2aacf0db4ba54a85df1d81deed
- Trigger Event: workflow_dispatch

opencc-py 1.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

opencc-py (OpenCC Pure Python Implementation)

Data Dependency

Installation

Supported Configs

Testing

OpenCC 1.3.2 Feature Coverage

Differences from the Official Implementation

License and Compliance

opencc-py (OpenCC 純 Python 實作)

資料依賴

安裝

支援的 Configs

測試

OpenCC 1.3.2 功能支援狀況

與官方實作的差異

License 與合規

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance