Skip to main content

The first Python package for measuring readability of Hindi text using Devanagari-aware formulas

Project description

hindi-readability 📖

The first Python package for measuring readability of Hindi text.

PyPI version Python License: MIT Tests

Zero external dependencies. Pure Python 3.8+. Works out of the box.


Why does this exist?

English has Flesch-Kincaid, Gunning Fog, and ARI readability formulas built into MS Word since 1992.

Hindi has nothing.

India has 24.8 crore school students, 886 million internet users consuming Hindi content, and 14.7 lakh schools — all producing Hindi text with no way to automatically measure whether it is easy or hard to read.

This package fills that gap with three original formulas designed from scratch for Devanagari script.


Installation

pip install hindi-readability

Quick Start

from hindi_readability import ReadabilityScorer

rs = ReadabilityScorer()

result = rs.score("यह एक सरल वाक्य है।")
print(result["hrs"])           # 75.2  — Hindi Readability Score (0–100)
print(result["label"])         # Easy
print(result["grade_label"])   # Class 6–8
print(result["cbse_level"])    # Madhyamik
print(result["hci"])           # 0.35  — Complexity Index (0–1)

Real Examples

from hindi_readability import ReadabilityScorer
rs = ReadabilityScorer()

# Easy — children's level
easy = "यह एक बच्चा है। वह खेलता है। घर अच्छा है। माँ पानी लाई।"
r = rs.score(easy)
# hrs = 75.2  label = 'Easy'  grade_label = 'Class 6–8'

# Standard newspaper Hindi
medium = "भारत में शिक्षा का स्तर तेजी से बदल रहा है। सरकार नई नीतियां बना रही है।"
r = rs.score(medium)
# hrs = 55.8  label = 'Standard'  grade_label = 'Class 9–10'

# Expert constitutional Hindi
hard = "संविधान की प्रस्तावना में भारत को एक संप्रभु, समाजवादी, धर्मनिरपेक्ष, लोकतांत्रिक गणराज्य घोषित किया गया है।"
r = rs.score(hard)
# hrs = 0.0  label = 'Expert'  grade_label = 'College+'

Full API Reference

rs.score(text) → dict

result = rs.score("यह सरल पाठ है।")
# {
#   "hrs": 75.2, "label": "Easy", "description": "Suitable for Class 3–5 students",
#   "grade": 7, "grade_label": "Class 6–8", "cbse_level": "Madhyamik",
#   "hci": 0.35, "syllables_per_word": 1.77, "conjunct_density": 15.4,
#   "raw": { "words": 13, "sentences": 4, "syllables": 23, "matras": 11,
#            "conjuncts": 2, "viramas": 2, "consonants": 18, ... }
# }

rs.compare(texts) → list sorted easiest first

ranked = rs.compare(["कठिन संवैधानिक पाठ।", "बच्चे खेलते हैं।", "भारत की नीति।"])
for r in ranked:
    print(f"HRS={r['hrs']:5.1f}  {r['label']:10}  {r['text'][:40]}")
# HRS= 91.2  Very easy   बच्चे खेलते हैं।
# HRS= 55.8  Standard    भारत की नीति।
# HRS=  0.0  Expert      कठिन संवैधानिक पाठ।

rs.batch_score(texts) → list in original order

results = rs.batch_score([text1, text2, text3])

rs.is_appropriate_for_grade(text, grade) → bool

rs.is_appropriate_for_grade("यह सरल पाठ है।", grade=5)       # True
rs.is_appropriate_for_grade("संवैधानिक प्रावधान।", grade=5)  # False

rs.simplify_suggestions(text) → list of Hindi suggestions

suggestions = rs.simplify_suggestions("संवैधानिक प्रावधानों के अनुसार...")
# ["संयुक्त अक्षरों वाले शब्द कम करें — तत्सम शब्दों की जगह तद्भव शब्द लिखें।",
#  "वाक्य छोटे करें — एक वाक्य में 10–12 से अधिक शब्द न रखें।"]

Low-level functions

from hindi_readability import (
    hindi_readability_score,  # float 0–100
    hindi_grade_level,        # dict {grade, grade_label, cbse_level}
    hindi_complexity_index,   # float 0–1
    analyse,                  # dict of raw Devanagari script counts
    syllables_per_word,       # float
    conjunct_density,         # conjuncts per 100 words
)

HRS Score Interpretation

Score Label Suitable for
90–100 Very easy Class 1–2 (Prathmik)
70–89 Easy Class 3–5 (Prathmik Uttara)
50–69 Standard Class 6–8 (Madhyamik)
30–49 Difficult Class 9–10 (Uccha Madhyamik)
10–29 Very hard Class 11–12 (Uccha Vidyalay)
0–9 Expert College+ (Snatak)

How the Formulas Work

Why English formulas fail on Hindi

English readability tools count syllables and word length. Hindi requires three features English simply does not have:

Matras (मात्राएँ) — Vowel signs attached to consonants (ि ी ु ू ा े ै ो ौ). Long matras indicate heavier syllables. English formulas cannot detect these.

Conjunct consonants (संयुक्त अक्षर) — Two consonants fused by a virama (्), for example क्ष, त्र, ज्ञ, प्र. These appear mainly in Sanskrit-origin vocabulary which is significantly harder for younger readers. This is the single biggest marker of Hindi difficulty and has no equivalent in English.

Devanagari syllable rules — Every Hindi consonant carries an implicit /a/ vowel unless killed by a virama. Standard English syllable-counting rules are completely blind to this.

HRS Formula

HRS = 206.0
      − (60.0 × avg syllables per word)
      − (1.8  × avg words per sentence)
      − (70.0 × conjunct density)
      − (8.0  × matra complexity)

HGL Formula

HGL = 17.2 − (HRS × 0.14)   →   CBSE Class 1 to College+

HCI Formula

HCI = 0.40×syllable_score + 0.20×sentence_score + 0.25×conjunct_score + 0.15×matra_score

Open Research Directions

This package provides a baseline. The following are open problems suitable for M.Tech dissertation or research paper:

  • Corpus validation : calibrate formula weights against human-graded Hindi texts (teacher-labeled data)
  • Domain calibration : news vs. textbooks vs. legal vs. social media have different norms
  • Hinglish (code-mixed) : no readability tool handles Hindi-English mixed text yet
  • Extension : Bengali, Marathi, Gujarati use the same Devanagari script family
  • ML-based approach : fine-tune IndicBERT for readability regression and compare against this baseline

Running the Tests

git clone https://github.com/Erprabhat8423/hindi-readability.git
cd hindi-readability
python tests/test_all.py
# Tests: 38/38 passed ✓

Project Structure

hindi-readability/
├── hindi_readability/
│   ├── __init__.py      # Public exports
│   ├── script.py        # Devanagari Unicode analyser
│   ├── formulas.py      # HRS, HGL, HCI implementations
│   └── scorer.py        # ReadabilityScorer public API
├── tests/
│   └── test_all.py      # 38 tests
├── pyproject.toml
└── README.md

Changelog

v0.2.0

  • Improved README with full API docs, real examples, formula explanations
  • Added HRS score table with CBSE level names in English
  • Added open research directions section for dissertation reference

v0.1.0

  • Initial release — HRS, HGL, HCI formulas
  • ReadabilityScorer with 5 public methods
  • 38 tests passing, zero external dependencies

Citation

@software{hindi_readability,
  author    = {Prabhat Chaudhary},
  title     = {hindi-readability: The First Python Package for Hindi Text Readability},
  year      = {2026},
  version   = {0.2.0},
  publisher = {PyPI},
  url       = {https://pypi.org/project/hindi-readability/}
}

License

MIT — free for academic and commercial use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hindi_readability-0.2.0.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hindi_readability-0.2.0-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file hindi_readability-0.2.0.tar.gz.

File metadata

  • Download URL: hindi_readability-0.2.0.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for hindi_readability-0.2.0.tar.gz
Algorithm Hash digest
SHA256 57a5d93852c7f10f8e503bd63bc71502178f86a4e7aa7c7cf7b55cffb873bb14
MD5 0efd8cfb4f738f8e33404a4101640fcd
BLAKE2b-256 a1df3a0593175c7e023d996afa9025cf9af4211dbf193d34e07242d39f0c984d

See more details on using hashes here.

File details

Details for the file hindi_readability-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for hindi_readability-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bb7a3eb4f2d220faf9253b598de4f066dbde04784ceff1429f6ca838d0f2ec08
MD5 b274ebbd076777e4fb29876db9cb8526
BLAKE2b-256 17f614d9f6061da38816ef4b8cd045e2e74b78b511740547787a88902c9057ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page