Implements the New Dale-Chall readability formula. Its output is tested against samples from the original publication.
Project description
The new Dale-Chall readability formula
I wrote this by ordering a copy of Readability Revisited: The new Dale-Chall readability formula. I used the book to code the library from scratch.
Installation:
$ pip install new-dale-chall-readability
Let's try it out:
$ ipython
In [1]: from new_dale_chall_readability import cloze_score, reading_level
In [2]: text = (
...: 'Latin for "friend of the court." It is advice formally offered '
...: 'to the court in a brief filed by an entity interested in, but not '
...: 'a party to, the case.'
...: )
In [3]: reading_level(text)
Out[3]: '7-8'
In [4]: cloze_score(text)
Out[4]: 36.91
What's a "cloze score" and "reading level"?
Cloze is a deletion test invented by Taylor (1953). The 36.91 score, above, means that roughly that 37% of the words could be deleted and the passage could still be understood. So, a higher cloze score is more readable. They "range from 58 and above for the easiest passages to 10-15 and below for the most difficult" (Chall & Dale, p. 75).
Reading level is the grade level of the material, in years of education. The scale is from 1 to 16+.
See the integration test file for text samples from the book, along with their scores.
Why yet another Dale-Chall readability library?
It's 2022 and there are probably a half-dozen implementations on PyPI. So why create another one?
- The existing libraries have issues that made me wonder if the results were accurate. For example:
- From my reading, I saw that reading levels are a set of
ten "increasingly broad bands" (p. 75).
And they have labels like
3
and7-8
. The existing readability libraries treat these as floating point numbers. But now I believe that an enumeration — or specifically, a Literal — captures the formula better:Literal["1", "2", "3", "4", "5-6", "7-8", "9-10", "11-12", "13-15", "16+"]
- I also couldn't find a good description of this "new" Dale-Chall formula, and how the existing libraries implement it.
- The readability scores are important for my international dictionary app: It shows definitions sorted with the most readable first, to increase comprehension. The entry for amicus curiae is a good example. But I was getting odd results on some pages.
- From my reading, I saw that reading levels are a set of
ten "increasingly broad bands" (p. 75).
And they have labels like
- Use Test-Driven Development to squash bugs and prevent regressions.
- Turn examples from the book into test cases.
- Write with modern Python. I'm no expert, so I'm learning as I go along. E.g.,
- It passes Pyright strict-mode type-checking.
- It uses recent type enhancements like
Literal
.
- Present a very easy API to use in any app or library.
- No need to instantiate an object and learn its API.
- Just import the needed function and call it.
The result is a library that provides, I think, more accurate readability scores.
References
Chall, J., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Brookline Books.
Taylor, W. (1953). Cloze procedure: a new tool for measuring readability. Journalism Quarterly, 33, 42-46.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for new-dale-chall-readability-1.0.12.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ecb39c648251d5db23d6e599b490f4814448e786a43ea7eac5eb6d26cb177b71 |
|
MD5 | 00ad4b357239b4afa81695a47f7f2d9e |
|
BLAKE2b-256 | 24418c1423e80a672d5fe00ef996ecc7035a422bfdee72b796f6ca86544a4834 |
Hashes for new_dale_chall_readability-1.0.12-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24ecbf7d56d6693456cd4f01d8c0a80711cae59bd93394767b4c9a41635e60ac |
|
MD5 | c9345b36bc44d1c4e5b97d3dbf58e382 |
|
BLAKE2b-256 | f9984943b9b5b2bfad9c4b6c31061b8e17f1fe3030271e6b343dbc386438c624 |