Calculate readability by using variable replacement model
Project description
概要
変数置き換えモデルを用いた英日両文に適用可能なリーダビリティ判定ツールです。
字種分割にはdivide-char-typeを, 音節数計算にはcount-syllableを使用しています。
戻り値は全体、段落ごと、センテンスごとのリーダビリティ値が取得できるようにしています。
変数置き換えモデルの指標
jFRE = 206.835-(1.015×ASL)-(84.6×ASW)
jFKG = (0.39×ASL)+(11.8×ASW)-15.59
jARI = (4.71×ACW)+(0.5×ASL)-21.43
jCLI = (5.88×ACW)-(29.6/ASL)-15.8
jSMOG = 1.031√(30×PS)+3.1291
*ASL = 字種分割語数/センテンス数
*ASW = 音節数・漢字の連なり数/字種分割語数
*ACW = シャノン情報量に基づく重み/字種分割語数
*PS = 英語3音節・漢字3字以上の字種分割語数/センテンス数
シャノン情報量に基づく重みは、英数字(61種類)を1として、ひらがな(88種類)をlog(1/88)/log(1/61)で,カタカナ(141種類)をlog(1/141)/log(1/61)で、漢字(20898種類)をlog(1/20898)/log(1/61)でそれぞれ重み付けする.
評価表
jFREはReading Ease Scoreに照らし合わせて評価します。
jFKG、jARI、jCLI、jSMOGはEstimated Reading Gradeに照らし合わせて評価します。
Reading Ease Score | Style Description | Estimated Reading Grade | Estimated Percent of U.S. Adults (1949) |
---|---|---|---|
0 to 30: | Very Difficult | College graduate | 4.5 |
30 to 50: | Difficult | 13th to 16th grade | 33 |
50 to 60: | Fairly Difficult | 10th to 12th grade | 54 |
60 to 70: | Standard | 8th to 9th grade | 83 |
70 to 80: | Fairly Easy | 7th grade | 88 |
80 to 90: | Easy | 6th grade | 91 |
90 to 100: | Very Easy | 5th grade | 93 |
- William H. DuBay: The Principles of Readability, 2004
セットアップ
pip install calculate-readability
アンインストール
pip uninstall calculate-readability divide-char-type count-syllable nltk
使用方法
from calculate_readability import calculate_readability
data = calculate_readability("今日の天気は晴れです。明日は曇りです。\n明後日は雨です。")
print(data["raw_text"])
print(data["text"])
print(data["jfre"])
print(data["break"][0]["text"])
print(data["break"][0]["jfre"])
print(data["break"][0]["sentence"][0]["text"])
print(data["break"][0]["sentence"][0]["jfre"])
論文
- 赤木信也ら:変数置き換えモデルを用いた医療関連文書の可読性分析,
- バイオメディカル・ファジィ・システム学会誌 19 (1), 19-27, 2017
- https://cir.nii.ac.jp/crid/1391975276374773248
別途、論文化、または、学会発表を予定してます。
ライセンス
- calculate-readability
- Python Software Foundation License
- Copyright (C) 2024 Shinya Akagi
- divide-char-type
- Python Software Foundation License
- Copyright (C) 2023-2024 Shinya Akagi
- count-syllable
- Python Software Foundation License
- Copyright (C) 2024 Shinya Akagi
- nltk
- Apache License 2.0
- Copyright (C) 2001-2023 NLTK Project
- cmudict
- BSD License
- Copyright (C) 1998 Carnegie Mellon University
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file calculate_readability-0.1.2.tar.gz
.
File metadata
- Download URL: calculate_readability-0.1.2.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d55b5cf74cc5eaae2cf72b3d085bf91f1b72836e3862b18210e8bfb726ada26c |
|
MD5 | f3e7ad3e467792b2f3e51495d3656771 |
|
BLAKE2b-256 | bd33f4017b6f5daa19bb447b29d3f178964533cce3806c2bb7563121be317af4 |
File details
Details for the file calculate_readability-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: calculate_readability-0.1.2-py3-none-any.whl
- Upload date:
- Size: 4.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 630bdbd23a5e6954a9f7760af2debf1c6e05955fbcad77585b26ea39292d650d |
|
MD5 | 6e2e179d72987b28470b25483bf52e43 |
|
BLAKE2b-256 | 8e752fbf92d7501c817e902a66f380ff5b420e873f2fc2a315726f89a711392a |