A Python extension module powered by Rust Jieba and PyO3, providing fast and accurate Chinese text conversion.

These details have not been verified by PyPI

Project links

Project description

opencc_jieba_pyo3

opencc_jieba_pyo3 is a Python extension module powered by Rust Jieba and PyO3, providing fast and accurate conversion between different Chinese text variants using opencc-jieba-rs and OpenCC algorithms.

Features

Convert between Simplified, Traditional, Hong Kong, Taiwan, and Japanese Kanji Chinese text.
Fast and memory-efficient, leveraging Rust's performance.
Easy-to-use Python API.
Supports punctuation conversion and automatic text code detection.
Chinese word segmentation (Jieba).
Keyword extraction (TF-IDF, TextRank).
Utility functions for punctuation handling and language detection.

Supported Conversion Configurations

s2t, t2s, s2tw, tw2s, s2twp, tw2sp, s2hk, hk2s, t2tw, tw2t, t2twp, tw2tp, t2hk, hk2t, t2jp, jp2t

Installation

Build and install the Python wheel using maturin:

# In project root
maturin build --release
pip install ./target/wheels/opencc_jieba_pyo3-<version>-cp<pyver>-abi3-<platform>.whl

Or for development:

maturin develop -r

See BUILD.md for detailed build and install instructions.

Usage

Python

from opencc_jieba_pyo3 import OpenCC

text = "“春眠不觉晓，处处闻啼鸟。”"
opencc = OpenCC("s2t")
converted = opencc.convert(text, punctuation=True)
print(converted)  # 「春眠不覺曉，處處聞啼鳥。」

# Segmentation
words = opencc.jieba_cut(text, hmm=True)
print(words)  # ['春眠', '不觉', '晓', '，', '处处', '闻', '啼鸟', '。']

# Segmentation and join
joined = opencc.jieba_cut_and_join(text, delimiter="/")
print(joined)  # 春眠/不觉/晓/，/处处/闻/啼鸟/。

# Keyword extraction (TextRank)
keywords = opencc.jieba_keyword_extract_textrank(text, top_k=3)
print(keywords)  # ['春眠', '啼鸟', '处处']

# Keyword extraction (TF-IDF)
keywords_tfidf = opencc.jieba_keyword_extract_tfidf(text, top_k=3)
print(keywords_tfidf)  # ['春眠', '啼鸟', '处处']

# Keyword weights (TextRank)
kw_weights = opencc.jieba_keyword_weight_textrank(text, top_k=3)
print(kw_weights)  # [('春眠', 1.23), ('啼鸟', 0.98), ('处处', 0.75)]

# Keyword weights (TF-IDF)
kw_weights_tfidf = opencc.jieba_keyword_weight_tfidf(text, top_k=3)
print(kw_weights_tfidf)  # [('春眠', 2.34), ('啼鸟', 1.56), ('处处', 1.12)]

CLI

You can also use the CLI interface:

convert

python -m opencc_jieba_pyo3 convert --help
usage: opencc_jieba_pyo3 convert [-h] [-i <file>] [-o <file>] [-c <conversion>] [-p] [--in-enc <encoding>] [--out-enc <encoding>]

options:
  -h, --help            show this help message and exit
  -i, --input <file>    Read original text from <file>.
  -o, --output <file>   Write converted text to <file>.
  -c, --config <conversion>
                        Conversion configuration: [s2t|s2tw|s2twp|s2hk|t2s|tw2s|tw2sp|hk2s|jp2t|t2jp]
  -p, --punct           Punctuation conversion
  --in-enc <encoding>   Encoding for input
  --out-enc <encoding>  Encoding for output

segment

python -m opencc_jieba_pyo3 segment --help
usage: opencc_jieba_pyo3 segment [-h] [-i <file>] [-o <file>] [-d <char>] [--in-enc <encoding>] [--out-enc <encoding>]

options:
  -h, --help            show this help message and exit
  -i, --input <file>    Read input text from <file>.
  -o, --output <file>   Write segmented text to <file>.
  -d, --delim <char>    Delimiter to join segments
  --in-enc <encoding>   Encoding for input
  --out-enc <encoding>  Encoding for output

python -m opencc_jieba_pyo3 convert -i input.txt -o output.txt -c s2t --punct
python -m opencc_jieba_pyo3 segment -i input.txt -o output.txt --delim "/"

API

Class: `OpenCC`

Unified Python interface for OpenCC and Jieba functionalities.

Constructor

OpenCC(config: str = "s2t")
- config: Conversion configuration (see above). Defaults to "s2t".

Attributes

config: str
- Current OpenCC conversion configuration.

Methods

convert(input: str, punctuation: bool = False) -> str
- Convert Chinese text using the current OpenCC config.
- input: Input text.
- punctuation: Whether to convert Chinese/Japanese punctuation to the target variant.
- Returns: Converted text as a string.
zho_check(input: str) -> int
- Detect the type of Chinese in the input text.
- Returns: Integer code (1: Traditional, 2: Simplified, 0: Others).
jieba_cut(input: str, hmm: bool = True) -> list[str]
- Segment Chinese text using Jieba.
- input: Input text.
- hmm: Whether to use HMM for new words.
- Returns: List of segmented words.
jieba_cut_and_join(input: str, delimiter: str = "/") -> str
- Segment and join Chinese text using Jieba.
- input: Input text.
- delimiter: Delimiter for joining words.
- Returns: Joined segmented string.
jieba_keyword_extract_textrank(input: str, top_k: int) -> list[str]
- Extract keywords using the TextRank algorithm.
- input: Input text.
- top_k: Number of keywords to extract.
- Returns: List of keywords.
jieba_keyword_extract_tfidf(input: str, top_k: int) -> list[str]
- Extract keywords using the TF-IDF algorithm.
- input: Input text.
- top_k: Number of keywords to extract.
- Returns: List of keywords.
jieba_keyword_weight_textrank(input: str, top_k: int) -> list[tuple[str, float]]
- Extract keywords and their weights using TextRank.
- input: Input text.
- top_k: Number of keywords to extract.
- Returns: List of (keyword, weight) tuples.
jieba_keyword_weight_tfidf(input: str, top_k: int) -> list[tuple[str, float]]
- Extract keywords and their weights using TF-IDF.
- input: Input text.
- top_k: Number of keywords to extract.
- Returns: List of (keyword, weight) tuples.

Development

Rust source: src/lib.rs
Python bindings: opencc_jieba_pyo3/init.py, opencc_jieba_pyo3/opencc_jieba_pyo3.pyi
CLI: opencc_jieba_pyo3/main.py

Rust Module Required

opencc-jieba-rs : A Rust implementation of Jieba + OpenCC

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.5

May 8, 2026

0.7.4

Apr 10, 2026

0.7.3

Mar 17, 2026

0.7.2

Nov 7, 2025

0.7.1

Oct 7, 2025

0.7.0

Aug 22, 2025

0.6.0

Jul 12, 2025

0.5.2

Jun 19, 2025

0.5.1

Jun 12, 2025

This version

0.5.0

Jun 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

opencc_jieba_pyo3-0.5.0-cp38-abi3-win_amd64.whl (7.0 MB view details)

Uploaded Jun 12, 2025 CPython 3.8+Windows x86-64

opencc_jieba_pyo3-0.5.0-cp38-abi3-manylinux_2_34_x86_64.whl (7.5 MB view details)

Uploaded Jun 12, 2025 CPython 3.8+manylinux: glibc 2.34+ x86-64

opencc_jieba_pyo3-0.5.0-cp38-abi3-macosx_11_0_arm64.whl (7.1 MB view details)

Uploaded Jun 12, 2025 CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file opencc_jieba_pyo3-0.5.0-cp38-abi3-win_amd64.whl.

File metadata

Download URL: opencc_jieba_pyo3-0.5.0-cp38-abi3-win_amd64.whl
Upload date: Jun 12, 2025
Size: 7.0 MB
Tags: CPython 3.8+, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.8.7

File hashes

Hashes for opencc_jieba_pyo3-0.5.0-cp38-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`b883614711428f08b0a740d35b7ad08dd1334ce255e885f933ed33348edd814d`
MD5	`5deb51ce9745d253cc8f4ba5f5195293`
BLAKE2b-256	`d1ef37bfda7eaf297e1b316d9045a135913aabfe33da1c3cfa66f81a6ad9ffe2`

See more details on using hashes here.

File details

Details for the file opencc_jieba_pyo3-0.5.0-cp38-abi3-manylinux_2_34_x86_64.whl.

File metadata

Download URL: opencc_jieba_pyo3-0.5.0-cp38-abi3-manylinux_2_34_x86_64.whl
Upload date: Jun 12, 2025
Size: 7.5 MB
Tags: CPython 3.8+, manylinux: glibc 2.34+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.8.7

File hashes

Hashes for opencc_jieba_pyo3-0.5.0-cp38-abi3-manylinux_2_34_x86_64.whl
Algorithm	Hash digest
SHA256	`bfeb1f8ef3e04119c8192e8e9e5d8031395a5955a03a463a949670f568501068`
MD5	`121b11ae9e49c31fc685663e4c05de96`
BLAKE2b-256	`13530f58163dfca62ed2b2be2eedd0a3d03da4e3f3844e00899761db76918948`

See more details on using hashes here.

File details

Details for the file opencc_jieba_pyo3-0.5.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: opencc_jieba_pyo3-0.5.0-cp38-abi3-macosx_11_0_arm64.whl
Upload date: Jun 12, 2025
Size: 7.1 MB
Tags: CPython 3.8+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.8.7

File hashes

Hashes for opencc_jieba_pyo3-0.5.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`c34195174e4c95d45505e3714bc7193bccf196d0eec915aedaa15a77dad67cee`
MD5	`498998adb2fb363706ed4cfd53ac5c73`
BLAKE2b-256	`9c5f222d9d89624366d98b6acffc5cb7c0cdf4d5e87e48b94973ea94273ecc29`

See more details on using hashes here.

opencc-jieba-pyo3 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

opencc_jieba_pyo3

Features

Supported Conversion Configurations

Installation

Usage

Python

CLI

convert

segment

API

Class: OpenCC

Constructor

Attributes

Methods

Development

Rust Module Required

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

Class: `OpenCC`