Text Analytics - Calculate statistical features from text
Project description
textly — Text Analytics for Python
textly is a Python library for computing readability scores, grade levels, and text complexity metrics for English text. It works with any prose but is particularly tuned for scientific and technical writing.
This library is built upon the foundation of the
scireadabilityPython library, but behaves differently.
Installation
pip install textly
Quick start
import textly
text = (
"Within the heterogeneous canopy of the Amazonian rainforest, "
"a fascinating interspecies interaction manifests between "
"Cephalotes atratus, a species of arboreal ant, and Epiphytes "
"dendrobii, a genus of epiphytic orchids."
)
textly.flesch_reading_ease(text)
textly.flesch_kincaid_grade(text)
textly.smog_index(text)
textly.coleman_liau_index(text)
textly.automated_readability_index(text)
textly.dale_chall_readability_score(text)
textly.gunning_fog(text)
textly.text_standard(text) # consensus grade across all formulas
textly.reading_time(text) # in seconds (default 200 WPM)
What makes textly different
- CMUdict-driven syllable counting — uses the Carnegie Mellon Pronouncing Dictionary, taking the minimum syllable count when multiple pronunciations exist.
- Custom syllable dictionary — user-editable overrides for domain-specific terms, jargon, and species names.
- Regex fallback — a refined heuristic counter tuned for scientific suffixes, agreeing with CMUdict ~91% of the time.
- Token-based difficult word rates — formulas like Dale–Chall and SPACHE use per-token analysis as the original research intended.
- Consistent tokenization — letter and character counting that works correctly with Coleman–Liau and similar formulas.
English only. For multilingual support, consider
textstat.
Readability formulas
| Function | What it returns |
|---|---|
textly.flesch_reading_ease(text) |
Score (higher = easier, up to ~121; negatives possible) |
textly.flesch_kincaid_grade(text) |
U.S. grade level from sentence length and syllables per word |
textly.gunning_fog(text) |
Grade level from sentence length and polysyllabic word percentage |
textly.smog_index(text) |
Grade level (best with ~30 sentences; returns 0.0 if < 3) |
textly.automated_readability_index(text) |
Grade level from characters per word and words per sentence |
textly.coleman_liau_index(text) |
Grade level from letters per word and sentences per word |
textly.linsear_write_formula(text) |
Grade level using the first 100 words |
textly.dale_chall_readability_score(text) |
Score based on difficult word percentage (see grade table below) |
textly.forcast(text) |
Grade level from single-syllable counts in a 150-word sample |
textly.spache_readability(text) |
Grade level for young readers |
textly.mcalpine_eflaw(text) |
Score for EFL (English as a Foreign Language) materials |
textly.lix(text) |
Swedish readability score (not grade-mapped) |
textly.rix(text) |
Grade level from long-word-to-sentence ratio |
textly.text_standard(text) |
Consensus grade from all formulas |
textly.reading_time(text, wpm=200.0) |
Estimated reading time in seconds |
Dale–Chall grade bands
| Score | Reader level |
|---|---|
| ≤ 4.9 | 4th grade or below |
| 5.0–5.9 | 5th–6th grade |
| 6.0–6.9 | 7th–8th grade |
| 7.0–7.9 | 9th–10th grade |
| 8.0–8.9 | 11th–12th grade |
| ≥ 9.0 | College level |
LIX score interpretation
| Score | Difficulty |
|---|---|
| < 30 | Very easy |
| 30–40 | Easy |
| 40–50 | Standard |
| 50–60 | Difficult |
| > 60 | Very difficult |
Text statistics
textly.syllable_count(text) # total syllables
textly.lexicon_count(text, removepunct=True) # word count
textly.sentence_count(text) # sentence count
textly.char_count(text, ignore_spaces=True) # character count
textly.letter_count(text, ignore_spaces=True) # alphabetic characters only
textly.polysyllabcount(text) # words with ≥3 syllables
textly.monosyllabcount(text) # words with 1 syllable
# Averages
textly.avg_sentence_length(text)
textly.avg_syllables_per_word(text)
textly.avg_character_per_word(text)
textly.avg_letter_per_word(text)
textly.avg_sentence_per_word(text)
# Difficult word analysis
textly.difficult_words(text) # count
textly.difficult_words_list(text) # list of tokens
textly.is_difficult_word("synapse") # True/False
textly.is_easy_word("dog") # True/False
Configuration
Rounding
By default, scores are returned unrounded. You can control this globally or per call.
# Global: round all subsequent calls to 2 decimal places
textly.set_rounding(True, points=2)
# Per-call: overrides the global setting for this call only
textly.flesch_kincaid_grade(text, rounding=True, points=1)
textly.flesch_reading_ease(text, rounding=False) # unrounded regardless of global
Apostrophe handling
# Default: False (preserves apostrophes in contractions like don't, it's)
textly.set_rm_apostrophe(True) # strips all apostrophes with other punctuation
Set this once at the start of your script — it affects all subsequent calls.
Custom syllable dictionary
Override syllable counts for specialized vocabulary.
textly.add_word_to_dictionary("pterodactyl", 4)
textly.add_words_from_file_to_dictionary("my_terms.json")
textly.overwrite_dictionary("full_replacement.json")
textly.revert_dictionary_to_default()
textly.print_dictionary()
Dictionary files use this JSON format:
{
"CUSTOM_SYLLABLE_DICT": {
"pterodactyl": 4,
"crispr": 2
}
}
Limitations
- SMOG requires ~30 sentences for reliable results; fewer than 3 returns 0.0.
- Short text snippets produce unstable scores across most formulas.
- Novel jargon may need custom dictionary entries for accurate syllable counts.
- The regex fallback is approximate by nature (~91% agreement with CMUdict).
- English only.
Contributing
Found a bug or have an idea? Open an issue. Want to contribute code? Submit a pull request.
- Fork the repo and create a branch.
- Add tests for your changes.
- Open a PR.
Changelog
See CHANGELOG.md for release history.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file textly-1.1.0.tar.gz.
File metadata
- Download URL: textly-1.1.0.tar.gz
- Upload date:
- Size: 943.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
175cde0314fd5a27584a027860c8cc7f6a7599f58bfbf17f376d912d8a0b894b
|
|
| MD5 |
0f9c8950299d4f132263e0899b9f3950
|
|
| BLAKE2b-256 |
3be720c9aaf8adac1d6430c0f5213aefb9c94da4fa28006e72b69093777cf117
|
File details
Details for the file textly-1.1.0-py3-none-any.whl.
File metadata
- Download URL: textly-1.1.0-py3-none-any.whl
- Upload date:
- Size: 943.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22bc67eff1b4c936b854a5d725da6c3f36e8437f2ea106c21e34edde3d81760e
|
|
| MD5 |
9b2df74fe9a9ab6fd3ffe7b4b8fe932b
|
|
| BLAKE2b-256 |
5bca63565c8b84fe43488fe5e413aabcee3a1bc117aee7d48a2de4b3b65a171a
|