Skip to main content

Measure the readability of a given text using surface characteristics

Project description

A collection of functions that measure the readability of a given body of text using surface characteristics. These measures are basically linear regressions based on the number of words, syllables, and sentences.

The functionality is modeled after the UNIX style(1) command. Compared to the implementation as part of GNU diction, this version supports UTF-8 encoded text, but expects sentence-segmented and tokenized text. The syllabification and word type recognition is based on simple heuristics and only provides a rough measure.

NB: all readability formulas were developed for English, so the scales of the outcomes are only meaningful for English texts.

Installation

$ pip install https://github.com/andreasvc/readability/tarball/master

Usage

$ readability --help
Simple readability measures.

Usage: readability.py [--lang=<x>] [file]

By default, input is read from standard input.
Text should be encoded with UTF-8,
one sentence per line, tokens space-separated.

  -L, --lang=<x>   set language (available: de, nl, en).

For proper results, the text should be tokenized. These tokenizers support Dutch and English:

Example using ucto:

$ ucto -L en -n -s '' "CONRAD, Joseph - Lord Jim.txt" | readability
[...]
readability grades:
    Kincaid:                     4.95
    ARI:                         5.78
    Coleman-Liau:                6.87
    FleschReadingEase:          86.18
    GunningFogIndex:             9.4
    LIX:                        30.97
    SMOGIndex:                   9.2
    RIX:                         2.39
sentence info:
    characters_per_word:         4.19
    syll_per_word:               1.25
    words_per_sentence:         14.92
    sentences_per_paragraph:        12.6
    characters:             552074
    syllables:              164207
    words:                  131668
    sentences:                8823
    paragraphs:                700
    long_words:              21122
    complex_words:           11306
word usage:
    tobeverb:                 3909
    auxverb:                  1632
    conjunction:              4413
    pronoun:                 18104
    preposition:             19271
    nominalization:           1216
sentence beginnings:
    pronoun:                  2593
    interrogative:             215
    article:                   632
    subordination:             124
    conjunction:               240
    preposition:               404

References

The following readability metrics are included:

  1. http://en.wikipedia.org/wiki/Automated_Readability_Index

  2. http://en.wikipedia.org/wiki/SMOG

  3. http://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_Grade_Level#Flesch.E2.80.93Kincaid_Grade_Level

  4. http://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_test#Flesch_Reading_Ease

  5. http://en.wikipedia.org/wiki/Coleman-Liau_Index

  6. http://en.wikipedia.org/wiki/Gunning-Fog_Index

For better readability measures, consider the following:

Acknowledgments

The code is based on: https://github.com/mmautner/readability

Which in turn was based on: https://github.com/nltk/nltk_contrib/tree/master/nltk_contrib/readability

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readability-0.1.tar.gz (9.4 kB view details)

Uploaded Source

File details

Details for the file readability-0.1.tar.gz.

File metadata

  • Download URL: readability-0.1.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for readability-0.1.tar.gz
Algorithm Hash digest
SHA256 b6d03b185448a0f97dfc48f0f3e1590b4b48f86f2b1ec0188408098fe49b14b6
MD5 ce09d671d2e29fb82e64c64bd743e0db
BLAKE2b-256 ee7c61d1f8592f261beba7f4f79104e6b467cff86ccb0a15f3008fefa46ddef7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page