计算中文文本可读性指标

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Text Processing :: Linguistic

Project description

中文文本可读性指标 Chinese Readability Score

Evaluate the readability of Chinese text using word segmentation, part-of-speech analysis, and syntactic dependency analysis capabilities. Supports multiple NLP providers including LTP, Jieba, and PKU. 利用分词、词性分析和句法依存分析能力，对中文文本的可读性进行评估。支持多种 NLP 提供方，包括 LTP、Jieba 和 PKU。

The code is implemented based on several papers I know, with the scoring metric named after the first author of each paper. 代码根据我已知的几篇论文分别进行实现，评分指标名称即论文第一作者姓名。

Installation

It's easy using pip, just run: 直接使用 pip 命令安装即可：

$ pip install readability_cn

Optional NLP providers: 可选的 NLP 提供方依赖：

# Install with Jieba support
$ pip install readability_cn[jieba]

# Install with PKU support
$ pip install readability_cn[pkuseg]

# Install with all optional providers
$ pip install readability_cn[all]

Usage

    import readability_cn
    from readability_cn.nlp import JiebaNLP, PkuNLP, LtpNLP

    # use LTP as default NLP provider
    readability = ChineseReadability()
    # or use other NLP providers
    # readability = ChineseReadability(nlp_provider=JiebaNLP())  # use Jieba
    # readability = ChineseReadability(nlp_provider=PkuNLP())    # use PKU
    # readability = ChineseReadability(nlp_provider=LtpNLP())    # explicitly use LTP

    # add new custom words
    readability.add_custom_words(['日志易', '优特捷'])

    # Compare readability metrics before and after file changes
    readability.analyze('old.adoc', 'new.adoc')

    # use your own preprocess functions
    import markdown
    import re
    with open(file_name, 'r', encoding='utf-8') as file:
        markdown_content = file.read()
    text = markdown.markdown(markdown_content)
    text = re.sub(r'\n+', '\n', content)
    ... # do other remove and replace here
    sentences = [sentence.strip() for sentence in readability.stnsplit.split(text) if sentence.strip()]
    readability.chengyong_gf0025_readability(sentences)

Use Custom Vocab

You can use the sentencepiece tool to extract a vocabulary from specific domain documents, referring to the custom_vocab.py implementation in the examples directory. Then merge it into the top-level vocabulary for use: 您可以通过 sentencepiece 工具，对特定领域文档提取词表，可以参考 examples 目录中的 custom_vocab.py 实现。然后合并到甲级词汇表中使用：

    # Load the top 16% of custom vocabulary as common words in specific fields
    # 可以加载自定义词表的前16%词汇作为特定领域的常用词汇
    # Default to the vocabulary from Fudan University's computer science corpus
    # 默认提供复旦大学计算机领域语料库的词表
    readability._load_custom_vocab()
    readability._load_custom_vocab("rizhiyi.vocab")

Note

The research in this field in China is mainly concentrated in the area of teaching Chinese as a foreign language. The research data primarily consists of a small number of textbook passages and Chinese proficiency test outlines. The coefficients obtained from polynomial linear regression fitting may not be effective for native speakers or technical documents. 国内进行相关研究的学者主要集中在对外汉语教育领域，研究数据集中为少量教材课文和汉语等级考试大纲等材料。多项式线性回归拟合的系数可能未必对母语用户、理工科文档等情况有效。
Some formulas are sensitive to the number of clauses. In this implementation, we simply use Chinese commas, semicolons, and colons for sentence splitting, without considering the mixed use of Chinese and English punctuation. 部分公式对分句数量敏感。本实现中简单使用中文的逗号、分号、冒号进行切分，并未考虑中英文标点混用的情况。
This implementation currently only provides preprocessing for asciidoc format text. For other formats, please refer to the preprocess_asciidoc() method to remove various markups. 本实现中暂时只提供了对 asciidoc 格式文本的预处理，其他格式请参照处理去除各种标记。

Thanks

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Text Processing :: Linguistic

Release history Release notifications | RSS feed

This version

0.2.0

Oct 31, 2025

0.1.6

Oct 29, 2025

0.1.4

Sep 12, 2024

0.1.3

Aug 16, 2024

0.1.2

Aug 15, 2024

0.1.1

Aug 15, 2024

0.1.0

Aug 15, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readability_cn-0.2.0.tar.gz (1.4 MB view details)

Uploaded Oct 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

readability_cn-0.2.0-py3-none-any.whl (1.4 MB view details)

Uploaded Oct 31, 2025 Python 3

File details

Details for the file readability_cn-0.2.0.tar.gz.

File metadata

Download URL: readability_cn-0.2.0.tar.gz
Upload date: Oct 31, 2025
Size: 1.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for readability_cn-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`1011b6a42659efa11001a3776afa173ce40b1d87efe4c5ca2071679fa7dc54a6`
MD5	`e79ab08c2a047cecaa5cd001c477756e`
BLAKE2b-256	`7f93e61d7c6299eb6ad19f9e1fddba1879990fa50021c9a5db85b1a5ab188023`

See more details on using hashes here.

File details

Details for the file readability_cn-0.2.0-py3-none-any.whl.

File metadata

Download URL: readability_cn-0.2.0-py3-none-any.whl
Upload date: Oct 31, 2025
Size: 1.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for readability_cn-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8a2f2391c8e4fb90e38b11c386ebc5c3c241cb432110a780a54176d0e4e99216`
MD5	`d5a724d4fd72a9e45fefaddaf544979b`
BLAKE2b-256	`247e7f9b42a9dc08136a479c0aa873c53e93e03d50bac6438c6e5020870c6251`

See more details on using hashes here.

readability-cn 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

中文文本可读性指标 Chinese Readability Score

Installation

Usage

Use Custom Vocab

Note

Thanks

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes