A lightweight Python tool for Persian/Farsi readability analysis using the Flesch-Dayani formula.
Project description
Persian Readability (Flesch–Dayani)
A lightweight Python package and command-line tool to calculate the Flesch–Dayani readability score for Persian (Farsi) text — with an optional POS-enhanced syllable counter for higher accuracy.
Features
- Persian text normalization and tokenization via
hazm - Punctuation-aware tokenization — علائم نشانهگذاری از شمارش کلمات و هجاها حذف میشوند
- Two-tier syllable counting:
- POS-enhanced (Better Accuracy) — if
parsivaris installed, uses part-of-speech tags to correctly count syllables in verbs with attached prefixes (میرود،نمیدانم) and comparative adjectives (بهتر،بزرگترین) - Morphological heuristic (Good Accuracy) — used automatically if
parsivaris not installed
- POS-enhanced (Better Accuracy) — if
- Context-aware خواه classifier — three-layer disambiguation prevents confusing
خواهش,خواهر,آزادیخواه, andخواه ... خواه ...with the future auxiliary (خواهم رفت) - Computes:
- Number of sentences, words, letters, and syllables
- ASL — Average Sentence Length (words per sentence)
- WL — Average Word Length (letters per word)
- ASYL — Average Syllables per Word (used in the original Dayani formula)
- Flesch–Dayani readability score
- Human-readable level (e.g. متوسط — مناسب دانشآموزان دبیرستان)
- Accepts input from a file, a command-line argument, or stdin (pipe-friendly)
--plainflag for scripting and pipeline use--verboseflag for debug logging- Warns when text is too short for a reliable score (< 50 words)
Readability Levels
| Score | Level |
|---|---|
| ≥ 90 | بسیار آسان — مناسب کودکان دبستانی |
| ≥ 80 | آسان — مناسب نوجوانان |
| ≥ 70 | نسبتاً آسان — مناسب عموم مردم |
| ≥ 60 | متوسط — مناسب دانشآموزان دبیرستان |
| ≥ 50 | نسبتاً دشوار — مناسب دانشجویان |
| ≥ 30 | دشوار — مناسب متخصصان |
| < 30 | بسیار دشوار — متون علمی/تخصصی |
Installation
Install from PyPI after release:
pip install persian-readability
For local development:
pip install -e ".[dev]"
For optional POS-enhanced syllable counting:
pip install "persian-readability[pos]"
Requirements
Required
- Python 3.10 or newer
hazm— Persian NLP library
pip install hazm
Optional (for higher syllable accuracy)
parsivar— Persian preprocessing toolkit with POS tagger
pip install parsivar
If
parsivaris not installed, the script falls back to the morphological heuristic automatically — no configuration needed.
Usage
Direct text:
persian-readability -t "متن فارسی شما"
From a file:
persian-readability -f sample.txt
From stdin (pipe):
echo "متن فارسی شما" | persian-readability
cat article.txt | persian-readability
Raw score only (for scripting):
persian-readability -f sample.txt --plain
With debug logging:
persian-readability -f sample.txt --verbose
Python API Usage
from persian_readability import calculate_readability
result = calculate_readability("برای پیشگیری از پوسیدگی دندان، روزی دو بار مسواک بزنید.")
print(result)
Real-World Examples
Example 1 — Public health text
Input:
persian-readability -t "برای پیشگیری از پوسیدگی دندان، بهتر است روزی دو بار مسواک بزنید و مصرف مواد قندی را کاهش دهید."
Possible use case:
This can help public health educators check whether patient-facing Persian health messages are simple enough for the general public.
Example 2 — Academic text
Input:
persian-readability -t "شاخصهای زیستی بزاقی میتوانند در تشخیص زودهنگام برخی بیماریهای دهان و فک و صورت نقش مهمی داشته باشند."
Possible use case:
Researchers can compare the readability of Persian academic summaries, abstracts, or educational materials.
Example 3 — Pipeline use
Input:
cat article.txt | persian-readability --plain
Possible use case:
Developers can integrate the readability score into larger Persian NLP or content-quality workflows.
Sample Output
══════════════════════════════════════════════════════
Persian Readability — Flesch–Dayani
══════════════════════════════════════════════════════
جملات : 5
کلمات : 87
حروف : 412
هجاها : 201
روش : POS-enhanced — Parsivar
────────────────────────────────────────────────────
ASL (کلمه/جمله) : 17.40
WL (حرف/کلمه) : 4.74
ASYL (هجا/کلمه) : 2.31
────────────────────────────────────────────────────
امتیاز Flesch–Dayani : 58.34
سطح خوانایی : متوسط — مناسب دانشآموزان دبیرستان
══════════════════════════════════════════════════════
Formula
FDR = 262.835 − 0.846 × ASYL − 1.015 × ASL
Where ASYL = average syllables per word and ASL = average words per sentence. Higher scores indicate easier text.
How Syllable Accuracy Tiers Work
| Mode | Accuracy | How |
|---|---|---|
| POS-enhanced | ~85% | Parsivar POSTagger (wapiti CRF, Bijankhan corpus) detects verb/adjective tags; prefix/suffix rules applied per POS |
| Morphological heuristic | ~75% | Counts written long vowels (ا و ی), diacritics, and word-final ه; no POS context |
Main cases where POS tagging improves accuracy:
- Verbs with attached
می/نمیprefix (no half-space):میرود→ +1 syllable - Comparative/superlative adjectives:
بهترین→ suffixترین= 2 syllables
خواه Classifier
The word خواه has multiple roles in Persian. A three-layer classifier resolves ambiguity before syllable counting:
| Label | Examples | Treatment |
|---|---|---|
FUTURE_AUX |
خواهم رفت، نخواهند پذیرفت | syllable count unchanged (هجاشماری base درست است) |
LEXICAL_KHASTAN |
خواهد که برود، این را خواهد | tag اصلی حفظ میشود |
PARTICLE_KHAH |
خواه بیاید خواه نیاید | treated as non-verb |
NOMINAL_DERIVATIVE |
خواهش، خواهان، خواهنده | treated as non-verb |
INDEPENDENT_WORD |
خواهر، خواهران | treated as non-verb |
SUFFIX_COMPOUND |
آزادیخواه، خیرخواه، دادخواه | treated as non-verb |
The classifier uses exact lexical sets (layer 1), suffix-compound detection (layer 2), and a 2-token context window (layer 3) — never a simple prefix regex.
Notes
- Minimum text length: The Flesch–Dayani formula is designed for running prose. Texts shorter than ~50 words produce unstable scores. A warning is emitted in this case (visible with
--verbose). - Punctuation filtering: علائم نشانهگذاری فارسی و لاتین (گیومه، نقطه، ویرگول، ...) از لبههای هر توکن پاک میشوند و توکنهای تمامعلامت از شمارش حذف میشوند.
- stdin: When running interactively without
-tor-f, the script waits for input and prints a prompt. PressCtrl+Dto signal end of input. - Log messages: All warnings go to stderr and do not affect
--plainoutput.
Running Tests
pip install pytest hazm
python -m pytest tests/test_core.py -v
76 tests covering: خواه classifier (all 9 document cases), punctuation filtering, syllable counting, heuristic limitations, formula verification, and edge cases.
References
- Dayani, M. (1374/1995). سنجش خوانایی متون فارسی. Persian adaptation of the Flesch Reading Ease formula.
- Mohtaj et al. (2018). Parsivar: A Language Processing Toolkit for Persian. LREC 2018.
- Mohammadi & Khasteh (2020). A Machine Learning Approach to Persian Text Readability.
- Sobhe. hazm — Persian NLP library.
Author
Dr. Mohammad Pirouzan — @Drpirouzan
License
MIT License — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file persian_readability-0.1.2.tar.gz.
File metadata
- Download URL: persian_readability-0.1.2.tar.gz
- Upload date:
- Size: 25.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb794cd1135b69a06d987566eff43bd137a58c14b79f2c4a5b1eea125919b1ed
|
|
| MD5 |
7b2fe88f757ba38771560eba92a4b3a7
|
|
| BLAKE2b-256 |
fd10059ec235bad0490e2b7388044263eb1e4ee23d12899dcaec23395abe6bff
|
File details
Details for the file persian_readability-0.1.2-py3-none-any.whl.
File metadata
- Download URL: persian_readability-0.1.2-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66d2dc0607b33feb499bd67842c033431274c395dcb27b778f7b2c31b40a7d62
|
|
| MD5 |
035433b49dc86d2ffc8f0be1a2f8ca57
|
|
| BLAKE2b-256 |
2ee2b0f6c64fc10772d1d536e9ac79c8f6c0e4e048cb9ff5f2a3ef9c543fb3fa
|