The first Python package for measuring readability of Hindi text using Devanagari-aware formulas
Project description
hindi-readability 📖🇮🇳
The first Python package for measuring the readability of Hindi text.
Zero external dependencies. Pure Python 3.9+.
The Problem
English has Flesch-Kincaid, Gunning Fog, and ARI — readability formulas used in MS Word since 1992. Hindi has nothing.
India has 24.8 crore school students, 886 million internet users consuming Hindi content, and 14.7 lakh schools — all producing and consuming Hindi text with no way to automatically measure whether it is easy or hard to read.
This package fills that gap with three original formulas designed specifically for Devanagari script.
Installation
pip install hindi-readability
Quick Start
from hindi_readability import ReadabilityScorer
rs = ReadabilityScorer()
# Simple sentence
result = rs.score("यह एक सरल वाक्य है।")
print(result["hrs"]) # Hindi Readability Score (0-100)
print(result["label"]) # "Easy"
print(result["grade_label"]) # "Class 3–5"
print(result["cbse_level"]) # "Prathmik Uttara"
# Constitutional text — hard
result = rs.score("संविधान की प्रस्तावना में भारत को एक संप्रभु, समाजवादी, धर्मनिरपेक्ष, लोकतांत्रिक गणराज्य घोषित किया गया है।")
print(result["hrs"]) # 0.0
print(result["label"]) # "Expert"
print(result["grade_label"])# "College+"
# Compare multiple texts — sorted easiest first
texts = [
"बच्चे खेलते हैं।",
"भारत की शिक्षा नीति बदल रही है।",
"संवैधानिक प्रावधानों के अनुसार नागरिकों के मूल अधिकार सुरक्षित हैं।",
]
ranked = rs.compare(texts)
for r in ranked:
print(f"{r['hrs']:5.1f} {r['label']:12} {r['text'][:40]}")
# Get simplification suggestions
suggestions = rs.simplify_suggestions("संवैधानिक प्रावधानों के अनुसार...")
for s in suggestions:
print(s)
# Check if appropriate for a school grade
rs.is_appropriate_for_grade("यह सरल पाठ है।", grade=5) # True/False
The Three Formulas
1. Hindi Readability Score (HRS)
An ease score from 0 to 100 — higher means easier. Inspired by Flesch Reading Ease but redesigned for Devanagari.
| Score | Label | Suitable for |
|---|---|---|
| 90–100 | Very easy | Class 1–2 |
| 70–89 | Easy | Class 3–5 |
| 50–69 | Standard | Class 6–8 |
| 30–49 | Difficult | Class 9–10 |
| 10–29 | Very hard | Class 11–12 |
| 0–9 | Expert | College+ |
Formula:
HRS = 206.0
- (60.0 × avg_syllables_per_word)
- (1.8 × avg_words_per_sentence)
- (70.0 × conjunct_density)
- (8.0 × matra_complexity)
2. Hindi Grade Level (HGL)
Maps HRS to Indian school grades (CBSE Class 1 to College+).
3. Hindi Complexity Index (HCI)
A normalized 0–1 score. Lower = easier. Useful for ML pipelines.
Why These Formulas Are Different
| Feature | English (Flesch-Kincaid) | Hindi (this package) |
|---|---|---|
| Syllable counting | English phoneme rules | Devanagari matra-based |
| Conjunct detection | Not applicable | ✓ Virama-based detection |
| Script-aware | No | ✓ Full Unicode U+0900–U+097F |
| Long vowel complexity | No | ✓ Guru/laghu distinction |
| CBSE grade mapping | No | ✓ Class 1–12 + College |
Conjunct consonants (संयुक्त अक्षर) — formed when a virama (्) joins two consonants — are the primary marker of Sanskrit-origin vocabulary. They appear in tatsam words (तत्सम) which are significantly harder for younger readers. This package detects them automatically using Unicode analysis.
What Is Solved vs. What This Package Solves
Already solved (for English)
- Flesch Reading Ease (1948)
- Flesch-Kincaid Grade Level (1975)
- Gunning Fog Index (1952)
What this package solves (first ever for Hindi)
- Matra-aware syllable counting
- Conjunct consonant density as a difficulty signal
- CBSE-aligned grade level output
- Actionable simplification suggestions in Hindi
Still open (future research / dissertation topics)
- Validation against human-graded Hindi texts (labeled corpus needed)
- Domain-specific calibration (news vs. textbooks vs. legal)
- Extension to Bengali, Marathi, Gujarati (same Devanagari script family)
- Hinglish (code-mixed Hindi-English) readability
API Reference
ReadabilityScorer.score(text) # Full report dict
ReadabilityScorer.compare(texts) # Rank list easiest→hardest
ReadabilityScorer.batch_score(texts) # Score list in order
ReadabilityScorer.is_appropriate_for_grade(text, grade) # bool
ReadabilityScorer.simplify_suggestions(text) # list of Hindi suggestions
# Low-level functions
hindi_readability_score(text) # float 0-100
hindi_grade_level(text) # dict {grade, grade_label, cbse_level}
hindi_complexity_index(text) # float 0-1
analyse(text) # dict of raw script counts
syllables_per_word(text) # float
conjunct_density(text) # conjuncts per 100 words
Citation
If you use this package in academic work:
@software{hindi_readability,
author = {Prabhat Chaudhary},
title = {hindi-readability: The First Python Package for Hindi Text Readability},
year = {2025},
publisher = {PyPI},
url = {https://pypi.org/project/hindi-readability/}
}
License
MIT — free for academic and commercial use.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hindi_readability-0.1.0.tar.gz.
File metadata
- Download URL: hindi_readability-0.1.0.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de9efad79516032debb88afddcaabc72ad4d9f79bf4ef39dedb0375883131c0d
|
|
| MD5 |
4e0cfb67ca81316c09fb6b166b2e07f1
|
|
| BLAKE2b-256 |
dd1f55beda066fd41e36eac5054ae5addbe8113edce4d9c45532393d045436c9
|
File details
Details for the file hindi_readability-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hindi_readability-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1464d9872c82858739e9548b60cd164ca7160c04469a9f5613642d1e3d3ca483
|
|
| MD5 |
12a260d63306bebe18359a50add65963
|
|
| BLAKE2b-256 |
b3851dd81af1df3184aa935f9998279b5f179534f98ae24c2eb9e597df89ae0a
|