Measure text comprehensibility/readability using the Mistrik formula
Project description
📚 Mistrík's measure of readability 📚
What is it?
Mistrík is a pure Python library/module that scores the readability of Slovak text using Mistrík's measure of readability and comprehension. The Mistrík’s readability formula is calculated using the Phrase Repetition Index. This implies that a text becomes easier to read the more words it repeats. The metric can be used to measure the readability index (R) of Slovak texts, textbooks, research papers, and many more. The original research by Jozef Mistrík can be found here (pp.171-178). 📑
Why we made this? 🤔
Readability measures are somewhat common in Slovakia, but not as widespread as they are abroad. Our goal was to support the use of readability measures, especially Mistrík’s, by creating an open-source Python library, since there is still no public library or tool that focuses on Slovak texts that we can freely use. 🙃 At the same time, we wanted to make this metric more accessible because improving reading comprehension skills not only improves comprehension but also supports lifelong learning by enabling individuals to effectively absorb information in a variety of areas. 📈
Description of measure 🖊️
S
= average length of words in number of syllables,
V
= average length of sentences in number of words,
N
= number of words,
L
= number of unique words,
I
= word repetition index (I = N/L),
R
= readability score 50 - ((S * V) / I)
Score | Difficulty |
---|---|
50 - 40 | Very Easy |
40-30 | Standard |
30-20 | Fairly Difficult |
20-10 | Difficult |
10-0 | Very Confusing |
In practice, this means that a text that scores between 40 and 50 is typical for fairy tales. On the contrary, a text that achieves a score of up to 20 is suitable for experts in the field to which the text relates or for university students.
💿 Getting started - installation: 💿
pip install mistrik
📦 Import module: 📦
from mistrik import Mistrik
👩🏻💻 Examples of use: 🧑🏻💻
text =
"""
Danka a Janka sú sestričky dvojčence a sú navlas
rovnaké. Danka má oči celkom ako Janka, hnedé a veselé
ani gaštančeky. A Janka má vlasy celkom ako Danka,
plavé a ostrihané na ofinu. Ešte aj nosy majú rovnaké:
trošku vyhrnuté a veľmi všetečné.
Danka a Janka sa rovnako aj obliekajú. Danka má
vždy taký istý kabát ako Janka a Janka také isté šaty ako
Danka. Aj čiapky a topánky majú vždy celkom rovnaké.
"""
M = Mistrik(text)
R = M.readability()
print (R)
Output:
MISTRIK MEASURE OF READABILITY:
SENTENCES: 7
SYLLABLES: 143
V: 10 (10.429)
S: 2.0 (1.959)
N: 73
L: 41
I: 1.78
R: 39 (38.523)
You can also access all variables like this:
M = Mistrik(text)
R = M.readability()
print ("Sentences:",R.SEN," Syllables:",R.SYL)
print ("The readability of the text is:", R.R)
Output:
Sentences: 7 Syllables: 143
The readability of the text is: 39
Support us 🌟
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mistrik-1.0.1.tar.gz
.
File metadata
- Download URL: mistrik-1.0.1.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 299bffc5cbc3dd1e1c0f03028697bdaab81302730e92b3bfbddaeaf21bddbae9 |
|
MD5 | 68cd0145602b8e09c545f1ec47b6336b |
|
BLAKE2b-256 | eea1060b9f38952a25599b778ca75f631b8add2213f6eb16c6a05c4e2f847eb5 |
File details
Details for the file mistrik-1.0.1-py2.py3-none-any.whl
.
File metadata
- Download URL: mistrik-1.0.1-py2.py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dec39e308925a8e8af60337b14801737b8d82caef4751be137fab1bc46f1e443 |
|
MD5 | 164b3da825dc0931e0731a976f24fbe4 |
|
BLAKE2b-256 | a530699b3d0305d4e1cf7899b75c065a7f6ec01480f9d18aaa3201ebd2b07a7b |