Skip to main content

Text Analytics - Calculate statistical features from text

Project description

textly — Text Analytics for Python

PyPI version License: MIT Python 3.10+

textly is a Python library for computing readability scores, grade levels, and text complexity metrics for English text. It works with any prose but is particularly tuned for scientific and technical writing.

This library is built upon the foundation of the scireadability Python library, but behaves differently.

Installation

pip install textly

Quick start

import textly

text = (
    "Within the heterogeneous canopy of the Amazonian rainforest, "
    "a fascinating interspecies interaction manifests between "
    "Cephalotes atratus, a species of arboreal ant, and Epiphytes "
    "dendrobii, a genus of epiphytic orchids."
)

textly.flesch_reading_ease(text)
textly.flesch_kincaid_grade(text)
textly.smog_index(text)
textly.coleman_liau_index(text)
textly.automated_readability_index(text)
textly.dale_chall_readability_score(text)
textly.gunning_fog(text)
textly.text_standard(text)          # consensus grade across all formulas
textly.reading_time(text)           # in seconds (default 200 WPM)

What makes textly different

  • CMUdict-driven syllable counting — uses the Carnegie Mellon Pronouncing Dictionary, taking the minimum syllable count when multiple pronunciations exist.
  • Custom syllable dictionary — user-editable overrides for domain-specific terms, jargon, and species names.
  • Regex fallback — a refined heuristic counter tuned for scientific suffixes, agreeing with CMUdict ~91% of the time.
  • Token-based difficult word rates — formulas like Dale–Chall and SPACHE use per-token analysis as the original research intended.
  • Consistent tokenization — letter and character counting that works correctly with Coleman–Liau and similar formulas.

English only. For multilingual support, consider textstat.

Readability formulas

Function What it returns
textly.flesch_reading_ease(text) Score (higher = easier, up to ~121; negatives possible)
textly.flesch_kincaid_grade(text) U.S. grade level from sentence length and syllables per word
textly.gunning_fog(text) Grade level from sentence length and polysyllabic word percentage
textly.smog_index(text) Grade level (best with ~30 sentences; returns 0.0 if < 3)
textly.automated_readability_index(text) Grade level from characters per word and words per sentence
textly.coleman_liau_index(text) Grade level from letters per word and sentences per word
textly.linsear_write_formula(text) Grade level using the first 100 words
textly.dale_chall_readability_score(text) Score based on difficult word percentage (see grade table below)
textly.forcast(text) Grade level from single-syllable counts in a 150-word sample
textly.spache_readability(text) Grade level for young readers
textly.mcalpine_eflaw(text) Score for EFL (English as a Foreign Language) materials
textly.lix(text) Swedish readability score (not grade-mapped)
textly.rix(text) Grade level from long-word-to-sentence ratio
textly.text_standard(text) Consensus grade from all formulas
textly.reading_time(text, wpm=200.0) Estimated reading time in seconds

Dale–Chall grade bands

Score Reader level
≤ 4.9 4th grade or below
5.0–5.9 5th–6th grade
6.0–6.9 7th–8th grade
7.0–7.9 9th–10th grade
8.0–8.9 11th–12th grade
≥ 9.0 College level

LIX score interpretation

Score Difficulty
< 30 Very easy
30–40 Easy
40–50 Standard
50–60 Difficult
> 60 Very difficult

Text statistics

textly.syllable_count(text)                  # total syllables
textly.lexicon_count(text, removepunct=True)  # word count
textly.sentence_count(text)                   # sentence count
textly.char_count(text, ignore_spaces=True)   # character count
textly.letter_count(text, ignore_spaces=True) # alphabetic characters only
textly.polysyllabcount(text)                  # words with ≥3 syllables
textly.monosyllabcount(text)                  # words with 1 syllable

# Averages
textly.avg_sentence_length(text)
textly.avg_syllables_per_word(text)
textly.avg_character_per_word(text)
textly.avg_letter_per_word(text)
textly.avg_sentence_per_word(text)

# Difficult word analysis
textly.difficult_words(text)        # count
textly.difficult_words_list(text)   # list of tokens
textly.is_difficult_word("synapse") # True/False
textly.is_easy_word("dog")          # True/False

Configuration

Rounding

By default, scores are returned unrounded. You can control this globally or per call.

# Global: round all subsequent calls to 2 decimal places
textly.set_rounding(True, points=2)

# Per-call: overrides the global setting for this call only
textly.flesch_kincaid_grade(text, rounding=True, points=1)
textly.flesch_reading_ease(text, rounding=False)  # unrounded regardless of global

Apostrophe handling

# Default: False (preserves apostrophes in contractions like don't, it's)
textly.set_rm_apostrophe(True)  # strips all apostrophes with other punctuation

Set this once at the start of your script — it affects all subsequent calls.

Custom syllable dictionary

Override syllable counts for specialized vocabulary.

textly.add_word_to_dictionary("pterodactyl", 4)
textly.add_words_from_file_to_dictionary("my_terms.json")
textly.overwrite_dictionary("full_replacement.json")
textly.revert_dictionary_to_default()
textly.print_dictionary()

Dictionary files use this JSON format:

{
  "CUSTOM_SYLLABLE_DICT": {
    "pterodactyl": 4,
    "crispr": 2
  }
}

Limitations

  • SMOG requires ~30 sentences for reliable results; fewer than 3 returns 0.0.
  • Short text snippets produce unstable scores across most formulas.
  • Novel jargon may need custom dictionary entries for accurate syllable counts.
  • The regex fallback is approximate by nature (~91% agreement with CMUdict).
  • English only.

Contributing

Found a bug or have an idea? Open an issue. Want to contribute code? Submit a pull request.

  1. Fork the repo and create a branch.
  2. Add tests for your changes.
  3. Open a PR.

Changelog

See CHANGELOG.md for release history.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textly-1.1.0.tar.gz (943.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

textly-1.1.0-py3-none-any.whl (943.8 kB view details)

Uploaded Python 3

File details

Details for the file textly-1.1.0.tar.gz.

File metadata

  • Download URL: textly-1.1.0.tar.gz
  • Upload date:
  • Size: 943.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for textly-1.1.0.tar.gz
Algorithm Hash digest
SHA256 175cde0314fd5a27584a027860c8cc7f6a7599f58bfbf17f376d912d8a0b894b
MD5 0f9c8950299d4f132263e0899b9f3950
BLAKE2b-256 3be720c9aaf8adac1d6430c0f5213aefb9c94da4fa28006e72b69093777cf117

See more details on using hashes here.

File details

Details for the file textly-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: textly-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 943.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for textly-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 22bc67eff1b4c936b854a5d725da6c3f36e8437f2ea106c21e34edde3d81760e
MD5 9b2df74fe9a9ab6fd3ffe7b4b8fe932b
BLAKE2b-256 5bca63565c8b84fe43488fe5e413aabcee3a1bc117aee7d48a2de4b3b65a171a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page