Multi-Dimensional Readability (MDR) score and 42 linguistic features for English text

These details have not been verified by PyPI

Project links

Project description

mdr-readability

A Python package for computing the MDR (Multi-Dimensional Readability) score and 42 linguistic features from English text.

MDR is a regression-based readability index that combines lexical, syntactic, and semantic features to predict text difficulty. It achieves R² = 0.9249 on the calibration corpus.

Installation

1. Install the package

pip install mdr-readability

Or, for development (editable install):

git clone https://github.com/jacktanhua/MDR.git
cd mdr-readability
pip install -e .

2. Install the spaCy model

python -m spacy download en_core_web_lg

3. Install NLTK data (first time only)

import nltk
nltk.download("punkt")
nltk.download("wordnet")
nltk.download("averaged_perceptron_tagger")

4. Place the vocabulary data files

Two external data files are required:

File	Description
`The New Dale-Chall Familiar Words List_2950.txt`	Dale-Chall familiar word list
`cefrj-vocabulary-profile-1.5.csv`	CEFR vocabulary profile (must have `headword` and `CEFR` columns)

Optional (for norm scores):

File	Description
`MDR_level_features_norm.csv`	Level-specific norm values (must have `Code` column + `L1`–`L12` columns)

By default the package looks for these files in <package_root>/data/.
You can point it to any directory:

import mdr_readability
mdr_readability.set_data_dir("/path/to/your/data")

Quick Start

import mdr_readability

# Point to your data directory (skip if files are in the default location)
mdr_readability.set_data_dir("D:/MDR/data")

text = "The cat sat on the mat. It was a very small cat."

# --- Option 1: MDR score + 12 classic indices ---
df = mdr_readability.compute_mdr(text)
print(df)

# --- Option 2: MDR score + all 42 raw features + norm values ---
df_norm = mdr_readability.compute_mdr_with_norm(text)
print(df_norm.T)  # Transposing makes it easier to read

# --- Option 3: Step by step ---
features_df = mdr_readability.calculate_features(text)
features_df = mdr_readability.calculate_mdr_readability(features_df)
print(features_df["MDR"].iloc[0])

# --- Classic readability only ---
scores = mdr_readability.calculate_classic_readability(text)
labels = [
    "Flesch Reading Ease", "Flesch Kincaid Grade", "Gunning Fog",
    "SMOG Index", "Automated Readability", "Coleman Liau",
    "Linsear Write", "Dale Chall", "Spache", "Rix", "Lix", "Text Standard"
]
for label, score in zip(labels, scores):
    print(f"{label}: {score}")

API Reference

`mdr_readability.set_data_dir(path)`

Set the directory from which vocabulary data files are loaded.
Call once before any computation when your data files are not in <package_root>/data/.

`mdr_readability.calculate_features(text) → pd.DataFrame`

Extract all 42 linguistic features from text.
Returns a single-row DataFrame with columns listed in mdr_readability.COLUMN_NAMES.

Feature categories:

Category	Features (count)
Syllable	7
Word length & characters	4
Lexical difficulty	6
Type-token ratio / frequency	3
Sentence length	2
Dependency / syntax	7
Passive voice	2
Semantic	7
Referencing / conjunction	4

`mdr_readability.calculate_mdr_readability(df) → pd.DataFrame`

Apply the MDR linear regression formula to a features DataFrame.
Returns a copy of df with an extra "MDR" column (rounded to 4 d.p.).

`mdr_readability.calculate_classic_readability(text) → list`

Return a list of 12 classic readability scores in this order:

[Flesch Reading Ease, Flesch Kincaid Grade, Gunning Fog, SMOG Index,
 Automated Readability Index, Coleman Liau Index, Linsear Write Formula,
 Dale Chall Readability Score, Spache Readability, RIX, LIX, Text Standard]

`mdr_readability.compute_mdr(text) → pd.DataFrame`

One-step convenience function.
Returns a single-row DataFrame with MDR plus all 12 classic indices.

`mdr_readability.compute_mdr_with_norm(text) → pd.DataFrame`

One-step convenience function with norm comparison.
Returns a single-row DataFrame with:

MDR_value
All 42 raw feature values (<feature_name>)
Corresponding norm values (<feature_name>_norm) — None if norm file is absent

Feature Codes

Each feature has a short code used in the norm CSV:

Code	Feature name
MWLS	avg_syllables_per_word_spacy
WO2S	words_over_2_syllables
W2SR	words_over_2_syllables_ratio
W2SE	words_over_2_syllables_entropy
W2SS	words_over_2_syllables_per_30_sentences
OSW1	one_syllable_words_per_150
OSW2	one_syllable_words_per_100
MWLL	Mean Word Length Refined
MLW	average_letters_per_100_words
MSW	Mean Sentence per Word
WLE	Word Length Entropy
DWR	Difficult Words Ratio
DWE	difficult_words_entropy
WLEC	word_level_entropy_CERF
MLF	Mean Lexical Frequency
LR	Lexical Richness Entropy
STTR	Standard Type Token Ratio
WZE	Word Zipf Entropy
MSL	Mean Sentence Length
SLE	Sentence Length Entropy
MDD	Mean Dependency Distance
DDE	Dependency Distance Entropy
DTE	Dependency Distribution Entropy
SED	Syntax Entropy Dependency
SEP	Syntax Entropy POS
SEC	syntax_entropy_component
PSR	Passive Sentence Ratio
PDE	Passive Dependency Entropy
TE	Topic Entropy
SE	Semantic Entropy
SR	Semantic Richness
SAN	Semantic Accuracy Noun
SAV	Semantic Accuracy Verb
SANV	Semantic Accuracy Noun_Verb
SACW	Semantic Accuracy Content Words
SC	Semantic Clarity
DSE	Descriptive Style Entropy
POE	POS Entropy
RE_I	Referencing Entropy I
RE_II	Referencing Entropy II
RE_III	Referencing Entropy III
CE	Conjunction Entropy I

Dependencies

spacy >= 3.0 + en_core_web_lg model
spacy-syllables
nltk (punkt, wordnet, averaged_perceptron_tagger)
textstat
wordfreq
numpy, pandas

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Apr 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdr_readability-1.0.0.tar.gz (16.0 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mdr_readability-1.0.0-py3-none-any.whl (15.1 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file mdr_readability-1.0.0.tar.gz.

File metadata

Download URL: mdr_readability-1.0.0.tar.gz
Upload date: Apr 19, 2026
Size: 16.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.7

File hashes

Hashes for mdr_readability-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`e764b843345c40b2ea62f6fba45d08982383bae0153011b8f8e1c1fe4bcd8226`
MD5	`017e666895e652e1d717768ab7e7d3ab`
BLAKE2b-256	`0a2668a5c1ae5374795dcff554f8fabc974ba9e2d2f58fbb1d6a758b2fdd1c23`

See more details on using hashes here.

File details

Details for the file mdr_readability-1.0.0-py3-none-any.whl.

File metadata

Download URL: mdr_readability-1.0.0-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 15.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.7

File hashes

Hashes for mdr_readability-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7fc209cf08b8749b38e1ae14c396c3fb602b268946db19384885e7d4a06d48c9`
MD5	`8356f7c5c3c967bf47aeafc57f8d9e35`
BLAKE2b-256	`3637ea5251bb064e98998085df92dc5afc20614486d07f96c87dcc961bd6680b`

See more details on using hashes here.

mdr-readability 1.0.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

mdr-readability

Installation

1. Install the package

2. Install the spaCy model

3. Install NLTK data (first time only)

4. Place the vocabulary data files

Quick Start

API Reference

mdr_readability.set_data_dir(path)

mdr_readability.calculate_features(text) → pd.DataFrame

mdr_readability.calculate_mdr_readability(df) → pd.DataFrame

mdr_readability.calculate_classic_readability(text) → list

mdr_readability.compute_mdr(text) → pd.DataFrame

mdr_readability.compute_mdr_with_norm(text) → pd.DataFrame

Feature Codes

Dependencies

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`mdr_readability.set_data_dir(path)`

`mdr_readability.calculate_features(text) → pd.DataFrame`

`mdr_readability.calculate_mdr_readability(df) → pd.DataFrame`

`mdr_readability.calculate_classic_readability(text) → list`

`mdr_readability.compute_mdr(text) → pd.DataFrame`

`mdr_readability.compute_mdr_with_norm(text) → pd.DataFrame`