Skip to main content

Calculate readability scores for Japanese texts.

Project description


Text readability calculator for Japanese learners 🇯🇵


jReadability allows python developers to calculate the readability of Japanese text using the model developed by Jae-ho Lee and Yoichiro Hasebe in Introducing a readability evaluation system for Japanese language education and Readability measurement of Japanese texts based on levelled corpora. Note that this is not an official implementation.

Demo

You can play with an interactive demo here.

Installation

pip install jreadability

Quickstart

from jreadability import compute_readability

# "Good morning! The weather is nice today."
text = 'おはようございます!今日は天気がいいですね。' 

score = compute_readability(text)

print(score) # 6.438000000000001

Readability scores

Level Readability score range
Upper-advanced [0.5, 1.5)
Lower-advanced [1.5, 2.5)
Upper-intermediate [2.5, 3.5)
Lower-intermediate [3.5, 4.5)
Upper-elementary [4.5, 5.5)
Lower-elementary [5.5, 6.5)

Note that this readability calculator is specifically for non-native speakers learning to read Japanese. This is not to be confused with something like grade level or other readability scores meant for native speakers.

Model

readability = {mean number of words per sentence} * -0.056
            + {percentage of kango} * -0.126
            + {percentage of wago} * -0.042
            + {percentage of verbs} * -0.145
            + {percentage of particles} * -0.044
            + 11.724

* "kango" (漢語) means Japanese word of Chinese origin while "wago" (和語) means native Japanese word.

Note on model consistency

The readability scores produced by this python package tend to differ slightly from the scores produced on the official jreadability website. This is likely due to the version difference in UniDic between these two implementations as this package uses UniDic 2.1.2 while theirs uses UniDic 2.2.0. This issue may be resolved in the future.

Batch processing

jreadability makes use of fugashi's tagger under the hood and initializes a new tagger everytime compute_readability is invoked. If you are processing a large number of texts, it is recommended to initialize the tagger first on your own, then pass it as an argument to each subsequent compute_readability call.

from fugashi import Tagger

texts = [...]

tagger = Tagger()

for text in texts:
    
    score = compute_readability(text, tagger) # fast :D
    #score = compute_readability(text) # slow :'(
    ...

Documentation

You can find this repo's (very minimal) documentation here.

Other implementations

The official jReadability implementation can be found on jreadability.net

A node.js implementation can also be found here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jreadability-1.1.5.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jreadability-1.1.5-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file jreadability-1.1.5.tar.gz.

File metadata

  • Download URL: jreadability-1.1.5.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for jreadability-1.1.5.tar.gz
Algorithm Hash digest
SHA256 173341dd9a9b6dac16b901757629762237733b7f1823824f92d20b26bd1a40e5
MD5 1791d3a2a6d3f74f92104b372d18dc62
BLAKE2b-256 6312f42bc3b9b72bbd1263879fed20a927041a028aecbfa4c7b11b9993eebf5a

See more details on using hashes here.

File details

Details for the file jreadability-1.1.5-py3-none-any.whl.

File metadata

  • Download URL: jreadability-1.1.5-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for jreadability-1.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 56d0500282c88697de878dc6f1d11619510b7e6d7e4f5a8cb1372988e35e5a1d
MD5 1cb277bb442c74997a3cb82c9b6f5941
BLAKE2b-256 48a2d431f3218557b43970a65036e849d33ce1011245b1c276a1b5ad3bce059a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page