Calculate readability scores for Japanese texts.
Project description
jReadability allows python developers to calculate the readability of Japanese text using the model developed by Jae-ho Lee and Yoichiro Hasebe in Introducing a readability evaluation system for Japanese language education and Readability measurement of Japanese texts based on levelled corpora. Note that this is not an official implementation.
Demo
You can play with an interactive demo here.
Installation
pip install jreadability
Quickstart
from jreadability import compute_readability
# "Good morning! The weather is nice today."
text = 'おはようございます!今日は天気がいいですね。'
score = compute_readability(text)
print(score) # 6.438000000000001
Readability scores
Level | Readability score range |
---|---|
Upper-advanced | [0.5, 1.5) |
Lower-advanced | [1.5, 2.5) |
Upper-intermediate | [2.5, 3.5) |
Lower-intermediate | [3.5, 4.5) |
Upper-elementary | [4.5, 5.5) |
Lower-elementary | [5.5, 6.5) |
Note that this readability calculator is specifically for non-native speakers learning to read Japanese. This is not to be confused with something like grade level or other readability scores meant for native speakers.
Model
readability = {mean number of words per sentence} * -0.056
+ {percentage of kango} * -0.126
+ {percentage of wago} * -0.042
+ {percentage of verbs} * -0.145
+ {percentage of particles} * -0.044
+ 11.724
* "kango" (漢語) means Japanese word of Chinese origin while "wago" (和語) means native Japanese word.
Note on model consistency
The readability scores produced by this python package tend to differ slightly from the scores produced on the official jreadability website. This is likely due to the version difference in UniDic between these two implementations as this package uses UniDic 2.1.2 while theirs uses UniDic 2.2.0. This issue may be resolved in the future.
Batch processing
jreadability makes use of fugashi's tagger under the hood and initializes a new tagger everytime compute_readability
is invoked. If you are processing a large number of texts, it is recommended to initialize the tagger first on your own, then pass it as an argument to each subsequent compute_readability
call.
from fugashi import Tagger
texts = [...]
tagger = Tagger()
for text in texts:
score = compute_readability(text, tagger) # fast :D
#score = compute_readability(text) # slow :'(
...
Other implementations
The official jReadability implementation can be found on jreadability.net
A node.js implementation can also be found here.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file jreadability-1.1.2.tar.gz
.
File metadata
- Download URL: jreadability-1.1.2.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53111aa5bf9404057f072cbde8aac9ec78896771062e8f2d6f70b981c6249f4b |
|
MD5 | 8b2f07bc7eebd42908ee0455fd15fc34 |
|
BLAKE2b-256 | 660f7d37de0a2aac9d6fc63f761d3a64ea8e8415fa06d23d10947e77fed959c9 |
File details
Details for the file jreadability-1.1.2-py3-none-any.whl
.
File metadata
- Download URL: jreadability-1.1.2-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 761e59414bdca2768edb58411c9bae86012c763864bf22d1d5a9695203b945d4 |
|
MD5 | 284eb790d99907caa2df598e40fdd2c2 |
|
BLAKE2b-256 | 5c4e6cb2496b0a3b65b11553dce356391262b64eca42e1e3f5c51ba2ed918129 |