Skip to main content

A Python library for text readability analysis, supporting multiple languages including English and Turkish.

Project description

SmoothText


license versions pypi downloads


Introduction

SmoothText is a Python library for calculating readability scores of texts and statistical information for texts in multiple languages.

The design principle of this library is to ensure high accuracy.

Requirements

Python 3.10 or higher.

External Dependencies

Library Version License Notes
NLTK >=3.9.1 Apache 2.0 Conditionally optional.
Stanza >=1.10.1 Apache 2.0 Conditionally optional.
CMUdict >=1.0.32 Apache 2.0 Required if Stanza is the selected backend.
Unidecode >=1.3.8 GNU GPLv2 Required.
Pyphen >=0.17.0 GPL 2.0+/LGPL 2.1+/MPL 1.1 Required.

Either NLTK or Stanza must be installed and used with the SmoothText library.

Features

Readability Analysis

SmoothText can calculate readability scores of text in the following languages, using the following formulas.

Formula/Language English Turkish
Flesch Reading Ease Ateşman
Flesch-Kincaid Grade Bezirci-Yılmaz
Flesch-Kincaid Grade Simplified

Notes:

  • Ateşman is the Turkish adaptation of Flesch Reading Ease.
  • Bezirci-Yılmaz is the Turkish adaptation of Flesch-Kincaid Grade.

Sentencizing, Tokenizing, and Syllabifying

SmoothText can extract sentences, words, or syllables from texts.

Reading Time

SmoothText can calculate how long would a text take to read.

Installation

You can install SmoothText via pip.

pip install smoothtext

Usage

Importing and Initializing the Library

SmoothText comes with four submodules: Backend, Language, ReadabilityFormula and SmoothText.

from smoothtext import Backend, Language, ReadabilityFormula, SmoothText

Instancing

SmoothText was not designed to be used with static methods. Thus, an instance must be created to access its methods.

When creating an instance, the language and the backend to be used with it can be specified.

The following will create a new SmoothText instance configured to be used with the English language (by default, the United Kingdom variant) using NLTK as the backend.

st = SmoothText('en', 'nltk')

Once an instance is created, its backend cannot be changed, but its working language can be changed at any time.

st.language = 'tr' # Now configured to work with Turkish.
st.language = 'en-us' # Switching back to English, but to the United States variant.

Readying the Backends

When an instance is created, the instance will first attempt to import and download the required backend/language data. To avoid this, and to prepare the required packages in advance, we can use the static SmoothText.prepare() method.

SmoothText.prepare('nltk', 'en,tr') # Preparing NLTK to be used with English and Turkish

Computing Readability Scores

Each language has its own set of readability formulas. When computing the readability score of a text in a language, one of the supporting formulas must be used. Using SmoothText, there are three ways to perform this calculation.

text: str = 'Forrest Gump is a 1994 American comedy-drama film directed by Robert Zemeckis.' # https://en.wikipedia.org/wiki/Forrest_Gump

# Generic computation method
st.compute_readability(text, ReadabilityFormula.Flesch_Reading_Ease)

# Using instance as a callable for generic computation
st(text, ReadabilityFormula.Flesch_Reading_Ease)

# Specific formula method
st.flesch_reading_ease(text)

Tokenizing and Calculating Text Statistics

SmoothText is designed to work with sentences, words/tokens, and syllables.

text = 'This is a test sentence. This is another test sentence. This is a third test sentence.'

st.count_sentences(text)
# Output: 3

st.count_words(text)
# Output: 14

st.count_syllables(text)
# Output: 21

Other Features

Refer to the documentation for a complete list of available methods.

Documentation

See here for API documentation.

License

SmoothText has an MIT license. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smoothtext-0.1.0.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smoothtext-0.1.0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file smoothtext-0.1.0.tar.gz.

File metadata

  • Download URL: smoothtext-0.1.0.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for smoothtext-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8d380e4ab37a64f3b3ce1986cb3ec4dc86972fa85e7395489cb5ba14332a4d73
MD5 f5a180d02d529738a7bb7ef59dfe2f1c
BLAKE2b-256 ab33fca71e79d8d6b50e79efee9668076154aa6922dbe155644b4e5fd3f85fc6

See more details on using hashes here.

File details

Details for the file smoothtext-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: smoothtext-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for smoothtext-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f5f33748ca85a0a9d6684b24d90908dae5bb09a1c76845b8e1d77fd226d9eb54
MD5 b7db036d748e310ea0fa2866265309c3
BLAKE2b-256 0f64cbd31f77078694ebc605f30368ed4bba73bdf0df0ba11c26f0a74ff7f3b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page