Python package for calculating famous measures in computational linguistics
Project description
LinguaF
LinguaF provides an easy access for researchers and developers to methods of quantitative language analysis, such as: readability, complexity, diversity, and other descriptive statistics.
Usage
documents = [
"Pain and suffering are always inevitable for a large intelligence and a deep heart. The really great men must, I think, have great sadness on earth.",
"To go wrong in one's own way is better than to go right in someone else's.",
"The darker the night, the brighter the stars, The deeper the grief, the closer is God!"
]
Descriptive Statistics
The following descriptive statistics are supported (descriptive_statistics.py
module):
- Number of characters
char_count
- Number of letters
letter_count
- Number of punctuation characters
punctuation_count
- Number of digits
digit_count
- Number of syllables
syllable_count
- Number of sentences
sentence_count
- Number of n-syllable words
number_of_n_syllable_words
- Number of n-syllable words for all found syllables
number_of_n_syllable_words_all
- Average syllables per word
avg_syllable_per_word
- Average word length
avg_word_length
- Average sentence length
avg_sentence_length
- Average words per sentence
avg_words_per_sentence
Additional methods:
- Get lexical items (nouns, adjectives, verbs, adverbs)
get_lexical_items
- Get n-grams
get_ngrams
- Get sentences
get_sentences
- Get words
get_words
- Tokenize
tokenize
- Remove punctuation
remove_punctuation
- Remove digits
remove_digits
Example:
from linguaf import descriptive_statistics as ds
ds.avg_words_per_sentence(documents)
# Output: 15
Syntactical Complexity
The following syntactical complexity metrics are supported (syntactical_complexity.py
module):
- Mean Dependency Distance (MDD)
mean_dependency_distance
Example:
from linguaf import syntactical_complexity as sc
sc.mean_dependency_distance(documents)
# Output: 2.375
Lexical Diversity
The following lexical diversity metrics are supported (lexical_diversity.py
module):
- Lexical Density (LD)
lexical_density
- Type Token Ratio (TTR)
type_token_ratio
- Herdan's Constant or Log Type Token Ratio (LogTTR)
log_type_token_ratio
- Summer's Index
summer_index
- Root Type Token Ratio (RootTTR)
root_type_token_ratio
Example:
from linguaf import lexical_diversity as ld
ld.log_type_token_ratio(documents)
# Output: 0.9403574963462502
Readability
The following readability metrics are supported (readability.py
module):
- Flesch Reading Ease (FRE)
flesch_reading_ease
- Flesch-Kincaid Grade (FKG)
flesch_kincaid_grade
- Automated Readability Index (ARI)
automated_readability_index
- Simple Automated Readability Index (sARI)
automated_readability_index_simple
- Coleman's Readability Score
coleman_readability
- Easy Listening Score
easy_listening
Example:
from linguaf import readability as r
r.flesch_kincaid_grade(documents)
# Output: 4.813333333333336
Install
Via PIP
pip install linguaf
Latest version from GitHub
git clone https://github.com/Perevalov/LinguaF.git
cd LinguaF
pip install .
Language Support
At the moment, library supports the following languages:
- English 🇬🇧 (
en
): full support - Russian 🇷🇺 (
ru
): full support - German 🇩🇪 (
de
) - French 🇫🇷 (
fr
) - Spanish 🇪🇸 (
es
) - Chinese 🇨🇳 (
zh
) - Lithuanian 🇱🇹 (
lt
) - Belarusian 🇧🇾 (
be
) - Ukrainian 🇺🇦 (
uk
) - Armenian 🇦🇲 (
hy
)
Important: not every method is implemented for every language. If you use a particular method that does not support the input language, you'll get a ValueError
.
Citation
TBD
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file linguaf-0.1.2.tar.gz
.
File metadata
- Download URL: linguaf-0.1.2.tar.gz
- Upload date:
- Size: 27.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 170065332f53d382d9b53a9d62b69604c09f784d440ea6eeaf29b52e3cd1e21a |
|
MD5 | 5bf195d440eda6762668fb6cc8153be6 |
|
BLAKE2b-256 | 889c2dc0bc35ceb2c5ea27dfc709679642f2126cd8aa01441658bfae6772c5f6 |
File details
Details for the file linguaf-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: linguaf-0.1.2-py3-none-any.whl
- Upload date:
- Size: 29.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a1ba93fe7c055e10562a67b06b37ee97f5aa70d2be467284b35e847a1aeb3d3 |
|
MD5 | bb2992a1ea8069d3dcf0a2d1d69c64d7 |
|
BLAKE2b-256 | fdea9216a703dde21ef4af728890c3091c83574214adfd2cd50eac278ba9a9c3 |