Skip to main content

Calculate readability by using variable replacement model

Project description

概要

変数置き換えモデルを用いた英日両文に適用可能なリーダビリティ判定ツールです。
字種分割にはdivide-char-typeを, 音節数計算にはcount-syllableを使用しています。
戻り値は全体、段落ごと、センテンスごとのリーダビリティ値が取得できるようにしています。

変数置き換えモデルの指標

jFRE = 206.835-(1.015×ASL)-(84.6×ASW)
jFKG = (0.39×ASL)+(11.8×ASW)-15.59
jARI = (4.71×ACW)+(0.5×ASL)-21.43
jCLI = (5.88×ACW)-(29.6/ASL)-15.8
jSMOG = 1.031√(30×PS)+3.1291

*ASL = 字種分割語数/センテンス数
*ASW = 音節数・漢字の連なり数/字種分割語数
*ACW = シャノン情報量に基づく重み/字種分割語数
*PS = 英語3音節・漢字3字以上の字種分割語数/センテンス数

シャノン情報量に基づく重みは、英数字(61種類)を1として、ひらがな(88種類)をlog(1/88)/log(1/61)で,カタカナ(141種類)をlog(1/141)/log(1/61)で、漢字(20898種類)をlog(1/20898)/log(1/61)でそれぞれ重み付けする.

評価表

jFREはReading Ease Scoreに照らし合わせて評価します。
jFKG、jARI、jCLI、jSMOGはEstimated Reading Gradeに照らし合わせて評価します。

Reading Ease Score Style Description Estimated Reading Grade Estimated Percent of U.S. Adults (1949)
0 to 30: Very Difficult College graduate 4.5
30 to 50: Difficult 13th to 16th grade 33
50 to 60: Fairly Difficult 10th to 12th grade 54
60 to 70: Standard 8th to 9th grade 83
70 to 80: Fairly Easy 7th grade 88
80 to 90: Easy 6th grade 91
90 to 100: Very Easy 5th grade 93

セットアップ

pip install calculate-readability

アンインストール

pip uninstall calculate-readability divide-char-type count-syllable nltk

使用方法

from calculate_readability import calculate_readability

data = calculate_readability("今日の天気は晴れです。明日は曇りです。\n明後日は雨です。")

print(data["raw_text"])
print(data["text"])
print(data["jfre"])

print(data["break"][0]["text"])
print(data["break"][0]["jfre"])

print(data["break"][0]["sentence"][0]["text"])
print(data["break"][0]["sentence"][0]["jfre"])

論文

別途、論文化、または、学会発表を予定してます。

ライセンス

  • calculate-readability
    • Python Software Foundation License
    • Copyright (C) 2024 Shinya Akagi
  • divide-char-type
    • Python Software Foundation License
    • Copyright (C) 2023-2024 Shinya Akagi
  • count-syllable
    • Python Software Foundation License
    • Copyright (C) 2024 Shinya Akagi
  • nltk
    • Apache License 2.0
    • Copyright (C) 2001-2023 NLTK Project
  • cmudict
    • BSD License
    • Copyright (C) 1998 Carnegie Mellon University

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

calculate_readability-0.1.2.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

calculate_readability-0.1.2-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file calculate_readability-0.1.2.tar.gz.

File metadata

  • Download URL: calculate_readability-0.1.2.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.15

File hashes

Hashes for calculate_readability-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d55b5cf74cc5eaae2cf72b3d085bf91f1b72836e3862b18210e8bfb726ada26c
MD5 f3e7ad3e467792b2f3e51495d3656771
BLAKE2b-256 bd33f4017b6f5daa19bb447b29d3f178964533cce3806c2bb7563121be317af4

See more details on using hashes here.

File details

Details for the file calculate_readability-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for calculate_readability-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 630bdbd23a5e6954a9f7760af2debf1c6e05955fbcad77585b26ea39292d650d
MD5 6e2e179d72987b28470b25483bf52e43
BLAKE2b-256 8e752fbf92d7501c817e902a66f380ff5b420e873f2fc2a315726f89a711392a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page