A Python library for text readability analysis, supporting multiple languages.

These details have not been verified by PyPI

Project links

Project description

SmoothText

Introduction

SmoothText is a Python library for calculating readability scores of texts and statistical information for texts in multiple languages.

The design principle of this library is to ensure high accuracy.

Requirements

Python 3.10 or higher.

External Dependencies

Library	Version	License	Notes
NLTK	`>=3.9.1`	`Apache 2.0`	Conditionally optional.
Stanza	`>=1.10.1`	`Apache 2.0`	Conditionally optional.
CMUdict	`>=1.0.32`	`GPLv3+`	Required if `Stanza` is the selected backend.
Unidecode	`>=1.3.8`	`GNU GPLv2`	Required.
Pyphen	`>=0.17.0`	`GPL 2.0+/LGPL 2.1+/MPL 1.1`	Required.
emoji	`>=2.14.1`	`BSD`	Required.

Either NLTK or Stanza must be installed and used with the SmoothText library.

Features

Readability Analysis

SmoothText can calculate readability scores of text in the following languages, using the following formulas.

Formula/Language	English	German	Turkish
Flesch Reading Ease	✔	✔	✔ Ateşman
Flesch-Kincaid Grade	✔	✔ Wiener Sachtextformel	✔ Bezirci-Yılmaz
Flesch-Kincaid Grade Simplified	✔	❌	❌

Notes:

English:

Formulas work best with US English. However, SmoothText supports both US English and GB English.

German:

Flesch Reading Ease is applicable to German texts. SmoothText handles the language-specific adaptations of the formula.
Wiener Sachtextformel is the German adaptation of Flesch-Kincaid Grade.

Turkish:

Ateşman is the Turkish adaptation of Flesch Reading Ease.
Bezirci-Yılmaz is the Turkish adaptation of Flesch-Kincaid Grade.

Sentencizing, Tokenizing, and Syllabifying

SmoothText can extract sentences, words, or syllables from texts.

Reading Time

SmoothText can calculate how long would a text take to read.

Installation

You can install SmoothText via pip.

pip install smoothtext

Usage

Importing and Initializing the Library

SmoothText comes with four submodules: Backend, Language, ReadabilityFormula and SmoothText.

from smoothtext import Backend, Language, ReadabilityFormula, SmoothText

Instancing

SmoothText was not designed to be used with static methods. Thus, an instance must be created to access its methods.

When creating an instance, the language and the backend to be used with it can be specified.

The following will create a new SmoothText instance configured to be used with the English language (by default, the United States variant) using NLTK as the backend.

st = SmoothText('en', 'nltk')

Once an instance is created, its backend cannot be changed, but its working language can be changed at any time.

st.language = 'tr'  # Now configured to work with Turkish.
st.language = 'en-gb'  # Switching back to English, but to the United Kingdom variant.

Readying the Backends

When an instance is created, the instance will first attempt to import and download the required backend/language data. To avoid this, and to prepare the required packages in advance, we can use the static SmoothText.prepare() method.

SmoothText.prepare('nltk', 'en,tr')  # Preparing NLTK to be used with English and Turkish

Computing Readability Scores

Each language has its own set of readability formulas. When computing the readability score of a text in a language, one of the supporting formulas must be used. Using SmoothText, there are three ways to perform this calculation.

text: str = 'Forrest Gump is a 1994 American comedy-drama film directed by Robert Zemeckis.'  # https://en.wikipedia.org/wiki/Forrest_Gump

# Generic computation method
st.compute_readability(text, ReadabilityFormula.Flesch_Reading_Ease)

# Using instance as a callable for generic computation
st(text, ReadabilityFormula.Flesch_Reading_Ease)

# Specific formula method
st.flesch_reading_ease(text)

Tokenizing and Calculating Text Statistics

SmoothText is designed to work with sentences, words/tokens, and syllables.

text = 'This is a test sentence. This is another test sentence. This is a third test sentence.'

st.count_sentences(text)
# Output: 3

st.count_words(text)
# Output: 14

st.count_syllables(text)
# Output: 21

Other Features

Refer to the documentation for a complete list of available methods.

Inconsistencies

Backend Related Inconsistencies

NLTK and Stanza have different tokenization rules. This may cause differences in the number of tokens/sentences between the two backends.

Language Related Inconsistencies

The syllabification of words may differ within the same language variant. For example, the word "hello" has two syllables in American English but one in British English.

Documentation

See here for API documentation.

License

SmoothText has an MIT license. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Apr 13, 2025

0.3.2 yanked

Mar 10, 2025

0.3.1 yanked

Mar 3, 2025

0.3.0 yanked

Feb 16, 2025

0.2.8 yanked

Feb 14, 2025

This version

0.2.7 yanked

Feb 10, 2025

0.2.6 yanked

Feb 10, 2025

0.2.0 yanked

Feb 9, 2025

0.1.1 yanked

Feb 7, 2025

0.1.0 yanked

Feb 6, 2025

0.0.17 yanked

Jan 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smoothtext-0.2.7.tar.gz (27.3 kB view details)

Uploaded Feb 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

smoothtext-0.2.7-py3-none-any.whl (22.1 kB view details)

Uploaded Feb 10, 2025 Python 3

File details

Details for the file smoothtext-0.2.7.tar.gz.

File metadata

Download URL: smoothtext-0.2.7.tar.gz
Upload date: Feb 10, 2025
Size: 27.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for smoothtext-0.2.7.tar.gz
Algorithm	Hash digest
SHA256	`944f7827f6b902b4441dde32d59df91ed252119f0ba379d2fea23449f7ec2f1f`
MD5	`586948249c6157253235822a66f660ec`
BLAKE2b-256	`c8b4ca0c472b76083a03b5c3d058d0f06c732357345030d0fd11470d901331d5`

See more details on using hashes here.

File details

Details for the file smoothtext-0.2.7-py3-none-any.whl.

File metadata

Download URL: smoothtext-0.2.7-py3-none-any.whl
Upload date: Feb 10, 2025
Size: 22.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for smoothtext-0.2.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a27fbdc78a0d7ef70287a9509270e4b2fb402db3d0933bbd340d8934d4e99d79`
MD5	`7142d319c3045e0378b4d0d9123484c3`
BLAKE2b-256	`d04dc642980e8f3123fa561718d73da8ce1bee2d1eaccbbb8d94944475f583cb`

See more details on using hashes here.

smoothtext 0.2.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SmoothText

Introduction

Requirements

External Dependencies

Features

Readability Analysis

Sentencizing, Tokenizing, and Syllabifying

Reading Time

Installation

Usage

Importing and Initializing the Library

Instancing

Readying the Backends

Computing Readability Scores

Tokenizing and Calculating Text Statistics

Other Features

Inconsistencies

Backend Related Inconsistencies

Language Related Inconsistencies

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes