A Python library for readability and textual metrics analysis, supporting multiple languages.

These details have not been verified by PyPI

Project links

Project description

SmoothText

Introduction

SmoothText is a Python library for calculating readability scores of texts and statistical information for texts in multiple languages.

The design principle of this library is to ensure high accuracy.

Requirements

Python Version

Python 3.10 or higher.

External Dependencies

Library	Version	License	Notes
NLTK	`>=3.9.1`	`Apache 2.0`	Conditionally optional.
Stanza	`>=1.10.1`	`Apache 2.0`	Conditionally optional.
CMUdict	`>=1.0.32`	`GPLv3+`	Required if `Stanza` is the selected backend.
Unidecode	`>=1.3.8`	`GNU GPLv2`	Required.
Pyphen	`>=0.17.0`	`GPL 2.0+/LGPL 2.1+/MPL 1.1`	Required.
emoji	`>=2.14.1`	`BSD`	Required.

Either NLTK or Stanza must be installed and used with the SmoothText library.

Features

Readability Analysis

SmoothText can calculate readability scores of text in the following languages, using the following formulas.

Method	Description
`compute_readability`	Computes the readability score of a text using a specified formula.

English

Method	Formula	Authors	Notes
`automated_readability_index`	Automated Readability Index	Smith & Senter, 1967	-
`flesch_reading_ease`	Flesch Reading Ease	Flesch, 1948	-
`flesch_kincaid_grade`	Flesch-Kincaid Grade	Kincaid et al., 1975	-
`flesch_kincaid_grade_simplified`	Flesch-Kincaid Grade Simplified	Kincaid et al., 1975	Essentially, the same as `Flesch-Kincaid Grade`. However, the output will be rounded due to the constant rounding.
`gunning_fog_index`	Gunning Fog Index	Gunning, 1952	-

Notes:

Although SmoothText supports both US English and GB English, formulas work best with US English.

German

Method	Formula	Authors	Notes
`amstad`	Flesch Reading Ease	Amstad, 1978	German adaptation of `Flesch Reading Ease`.
`wiener_sachtextformel`	Wiener Sachtextformel	Bamberger & Vanecek, 1984	German adaptation of `Flesch-Kincaid Grade`. All versions (1 through 4) are supported.

Russian

Method	Formula	Authors	Notes
`matskovskiy`	Matskovskiy	Matskovskiy, 1976	German adaptation of `Flesch Reading Ease`.

Turkish

Method	Formula	Authors	Notes
`atesman`	Ateşman	Ateşman, 1997	Turkish adaptation of `Flesch Reading Ease`.
`bezirci_yilmaz`	Bezirci-Yılmaz	Bezirci & Yılmaz, 2010	Turkish adaptation of `Flesch-Kincaid Grade`.

Sentencizing, Tokenization, and Syllabification

SmoothText can extract sentences, words, or syllables from texts.

Method	Description
Sentence Level
`sentencize`	Splits text into sentences using language-aware rules
`count_sentences`	Returns the number of sentences found in the text
Word Level
`tokenize`	Extracts word tokens from text; can group by sentences with the split_sentences flag
`count_words`	Counts the number of alphanumeric words in a text
`word_frequencies`	Returns a dictionary of word frequencies with optional lemmatization
Syllable Level
`syllabify`	Splits words into syllables; can be applied to words, tokens, or sentences
`count_syllables`	Counts syllables in words or text using language-specific rules
`syllable_frequencies`	Returns a dictionary mapping syllable counts to frequency in the analyzed text
Character Level
`count_consonants`	Counts the number of consonant characters in text
`count_vowels`	Counts the number of vowel characters in text
Emoji Handling
`demojize`	Converts emoji characters to their text descriptions with custom delimiters
`remove_emojis`	Removes all emoji characters from text

Notes

count_syllables is likely to produce more accurate results in comparison to the syllabify method.
At the moment, lemmatization is only supported for English with the Stanza as the backend. Other languages and backends will ignore the lemmatization flag.

Language	Sentencizing	Tokenization	Syllabification
English	✔ (`NLTK`, `Stanza`)	✔ (`NLTK`, `Stanza`)	✔ (`CMU Dictionary`, `Pyphen`)
German	✔ (`NLTK`, `Stanza`)	✔ (`NLTK`, `Stanza`)	✔ (`Pyphen`)
Russian	✔ (`NLTK`, `Stanza`)	✔ (`NLTK`, `Stanza`)	✔ (`Pyphen`)
Turkish	✔ (`NLTK`, `Stanza`)	✔ (`NLTK`, `Stanza`)	✔ (Custom formula)

Pyphen may not produce accurate results sometimes. Thus, whenever possible, custom syllabification formulas or dictionaries are preferred.

Reading Time

SmoothText can calculate how long would a text take to read. The reading time is calculated based on the average reading speed of an adult.

Method	Description
`reading_aloud_time`	Calculates the reading time of a text.
`reading_time`	Calculates the reading time of a text.
`silent_reading_time`	Calculates the silent reading time.

Installation

You can install SmoothText via pip.

pip install smoothtext

Usage

Importing and Initializing the Library

SmoothText comes with four submodules: Backend, Language, ReadabilityFormula and SmoothText.

from smoothtext import Backend, Language, ReadabilityFormula, SmoothText

Instancing

SmoothText was not designed to be used with static methods. Thus, an instance must be created to access its methods.

When creating an instance, the language and the backend to be used with it can be specified.

The following will create a new SmoothText instance configured to be used with the English language (by default, the United States variant) using NLTK as the backend.

st = SmoothText('en', 'nltk')

Once an instance is created, its backend cannot be changed, but its working language can be changed at any time.

st.language = 'tr'  # Now configured to work with Turkish.
st.language = 'en-gb'  # Switching back to English, but to the United Kingdom variant.

Readying the Backends

When an instance is created, the instance will first attempt to import and download the required backend/language data. To avoid this, and to prepare the required packages in advance, we can use the static SmoothText.prepare() method.

SmoothText.prepare('nltk', 'en,tr')  # Preparing NLTK to be used with English and Turkish

Computing Readability Scores

Each language has its own set of readability formulas. When computing the readability score of a text in a language, one of the supporting formulas must be used. Using SmoothText, there are three ways to perform this calculation.

text: str = 'Forrest Gump is a 1994 American comedy-drama film directed by Robert Zemeckis.'  # https://en.wikipedia.org/wiki/Forrest_Gump

# Generic computation method
st.compute_readability(text, ReadabilityFormula.Flesch_Reading_Ease)

# Using instance as a callable for generic computation
st(text, ReadabilityFormula.Flesch_Reading_Ease)

# Specific formula method
st.flesch_reading_ease(text)

Tokenizing and Calculating Text Statistics

SmoothText is designed to work with sentences, words/tokens, and syllables.

Other Features

Refer to the documentation for a complete list of available methods.

Inconsistencies

Backend Related Inconsistencies

NLTK and Stanza have different tokenization rules. This may cause differences in the number of tokens/sentences between the two backends.

Language Related Inconsistencies

The syllabification of words may differ within the same language variant. For example, the word "hello" has two syllables in American English but one in British English. See the code snippet below.
- To avoid this as much as possible, CMUdict is used for English as the default syllabification method. However, it may not be available in some cases. In such cases, Pyphen will be used as a fallback.

from pyphen import Pyphen

us = Pyphen(lang="en_US")
print(us.inserted("hello"))
# Output: 'hel-lo'

gb = Pyphen(lang="en_GB")
print(gb.inserted("hello"))
# Output: 'hello'

Documentation

See here for API documentation.

License

SmoothText has an MIT license. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Apr 13, 2025

0.3.2 yanked

Mar 10, 2025

0.3.1 yanked

Mar 3, 2025

0.3.0 yanked

Feb 16, 2025

0.2.8 yanked

Feb 14, 2025

0.2.7 yanked

Feb 10, 2025

0.2.6 yanked

Feb 10, 2025

0.2.0 yanked

Feb 9, 2025

0.1.1 yanked

Feb 7, 2025

0.1.0 yanked

Feb 6, 2025

0.0.17 yanked

Jan 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smoothtext-0.4.0.tar.gz (33.4 kB view details)

Uploaded Apr 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

smoothtext-0.4.0-py3-none-any.whl (26.6 kB view details)

Uploaded Apr 13, 2025 Python 3

File details

Details for the file smoothtext-0.4.0.tar.gz.

File metadata

Download URL: smoothtext-0.4.0.tar.gz
Upload date: Apr 13, 2025
Size: 33.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for smoothtext-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`e9efd1c6c45ef33a493e9354a43d18df68211fcb0fc972e5a9379022d4599e1e`
MD5	`6675d1780615c620001c74128bb62351`
BLAKE2b-256	`6b60101b20c7b9fe77743ea9f2c69764ccf97be13e43f269005707a4af683e65`

See more details on using hashes here.

File details

Details for the file smoothtext-0.4.0-py3-none-any.whl.

File metadata

Download URL: smoothtext-0.4.0-py3-none-any.whl
Upload date: Apr 13, 2025
Size: 26.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for smoothtext-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9a3e7674005248381de1c473fef40c7c6d3ffcff1fda31e27446c267a52c4d4f`
MD5	`150ebe0494a41a7fa1bbbd27d3a6b6e1`
BLAKE2b-256	`964ce2a502fe86522f462a7953650a835d68fb0d88b51bd5717da43ee228c66a`

See more details on using hashes here.

smoothtext 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SmoothText

Introduction

Requirements

Python Version

External Dependencies

Features

Readability Analysis

English

German

Russian

Turkish

Sentencizing, Tokenization, and Syllabification

Reading Time

Installation

Usage

Importing and Initializing the Library

Instancing

Readying the Backends

Computing Readability Scores

Tokenizing and Calculating Text Statistics

Other Features

Inconsistencies

Backend Related Inconsistencies

Language Related Inconsistencies

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes