A library for calculating a variety of features from text using spaCy

These details have not been verified by PyPI

Project links

Project description

TextDescriptives

A Python library for calculating a large variety of metrics from text(s) using spaCy v.3 pipeline components and extensions.

🔧 Installation

pip install textdescriptives

📰 News

We now have a TextDescriptives-powered web-app so you can extract and downloads metrics without a single line of code! Check it out here
Version 2.0 out with a new API, a new component, updated documentation, and tutorials! Components are now called by "textdescriptives/{metric_name}. New coherence component for calculating the semantic coherence between sentences. See the documentation for tutorials and more information!

⚡ Quick Start

Use extract_metrics to quickly extract your desired metrics. To see available methods you can simply run:

import textdescriptives as td
td.get_valid_metrics()
# {'quality', 'readability', 'all', 'descriptive_stats', 'dependency_distance', 'pos_proportions', 'information_theory', 'coherence'}

Set the spacy_model parameter to specify which spaCy model to use, otherwise, TextDescriptives will auto-download an appropriate one based on lang. If lang is set, spacy_model is not necessary and vice versa.

Specify which metrics to extract in the metrics argument. None extracts all metrics.

import textdescriptives as td

text = "The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it."
# will automatically download the relevant model (´en_core_web_lg´) and extract all metrics
df = td.extract_metrics(text=text, lang="en", metrics=None)

# specify spaCy model and which metrics to extract
df = td.extract_metrics(text=text, spacy_model="en_core_web_lg", metrics=["readability", "coherence"])

Usage with spaCy

To integrate with other spaCy pipelines, import the library and add the component(s) to your pipeline using the standard spaCy syntax. Available components are descriptive_stats, readability, dependency_distance, pos_proportions, coherence, and quality prefixed with textdescriptives/.

If you want to add all components you can use the shorthand textdescriptives/all.

import spacy
import textdescriptives as td
# load your favourite spacy model (remember to install it first using e.g. `python -m spacy download en_core_web_sm`)
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textdescriptives/all") 
doc = nlp("The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it.")

# access some of the values
doc._.readability
doc._.token_length

TextDescriptives includes convenience functions for extracting metrics from a Doc to a Pandas DataFrame or a dictionary.

td.extract_dict(doc)
td.extract_df(doc)

	text	first_order_coherence	second_order_coherence	pos_prop_DET	pos_prop_NOUN	pos_prop_AUX	pos_prop_VERB	pos_prop_PUNCT	pos_prop_PRON	pos_prop_ADP	pos_prop_ADV	pos_prop_SCONJ	flesch_reading_ease	flesch_kincaid_grade	smog	gunning_fog	automated_readability_index	coleman_liau_index	lix	rix	n_stop_words	alpha_ratio	mean_word_length	doc_length	proportion_ellipsis	proportion_bullet_points	duplicate_line_chr_fraction	duplicate_paragraph_chr_fraction	duplicate_5-gram_chr_fraction	duplicate_6-gram_chr_fraction	duplicate_7-gram_chr_fraction	duplicate_8-gram_chr_fraction	duplicate_9-gram_chr_fraction	duplicate_10-gram_chr_fraction	top_2-gram_chr_fraction	top_3-gram_chr_fraction	top_4-gram_chr_fraction	symbol_#_to_word_ratio	contains_lorem ipsum	passed_quality_check	dependency_distance_mean	dependency_distance_std	prop_adjacent_dependency_relation_mean	prop_adjacent_dependency_relation_std	token_length_mean	token_length_median	token_length_std	sentence_length_mean	sentence_length_median	sentence_length_std	syllables_per_token_mean	syllables_per_token_median	syllables_per_token_std	n_tokens	n_unique_tokens	proportion_unique_tokens	n_characters	n_sentences
0	The world is changed(...)	0.633002	0.573323	0.097561	0.121951	0.0731707	0.170732	0.146341	0.195122	0.0731707	0.0731707	0.0487805	107.879	-0.0485714	5.68392	3.94286	-2.45429	-0.708571	12.7143	0.4	24	0.853659	2.95122	41	0	0	0	0	0.232258	0.232258	0	0	0	0	0.0580645	0.174194	0	0	False	False	1.77524	0.553188	0.457143	0.0722806	3.28571	3	1.54127	7	6	3.09839	1.08571	1	0.368117	35	23	0.657143	121	5

📖 Documentation

TextDescriptives has a detailed documentation as well as a series of Jupyter notebook tutorials. All the tutorials are located in the docs/tutorials folder and can also be found on the documentation website.

Documentation
📚 Getting started	Guides and instructions on how to use TextDescriptives and its features.
👩‍💻 Demo	A live demo of TextDescriptives.
😎 Tutorials	Detailed tutorials on how to make the most of TextDescriptives
📰 News and changelog	New additions, changes and version history.
🎛 API References	The detailed reference for TextDescriptive's API. Including function documentation
📄 Paper	The preprint of the TextDescriptives paper.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.8.4

Dec 16, 2024

2.8.3

Dec 15, 2024

2.8.2

May 31, 2024

2.8.1

May 7, 2024

2.8.0

Apr 9, 2024

2.7.3

Feb 6, 2024

2.7.2

Feb 6, 2024

2.7.1

Oct 31, 2023

2.7.0

Oct 12, 2023

2.6.2

Jul 31, 2023

2.6.1

May 3, 2023

2.6.0

Apr 28, 2023

2.5.1

Apr 26, 2023

2.5.0

Apr 26, 2023

2.4.6

Apr 24, 2023

2.4.5

Apr 19, 2023

2.4.4

Mar 28, 2023

2.4.3

Mar 1, 2023

2.4.2

Mar 1, 2023

2.4.1

Feb 8, 2023

2.4.0

Jan 31, 2023

2.3.0

Jan 23, 2023

2.2.0

Jan 16, 2023

2.1.0

Jan 6, 2023

2.0.10

Jan 3, 2023

2.0.4

Jan 3, 2023

1.1.1

Dec 5, 2022

1.1.0

Sep 26, 2022

1.0.7

May 4, 2022

1.0.6

Oct 28, 2021

1.0.5

Oct 4, 2021

1.0.4

Aug 31, 2021

1.0.3

Aug 17, 2021

1.0.2

Aug 16, 2021

1.0.1

Aug 9, 2021

1.0.0

Aug 9, 2021

0.2.0

Aug 9, 2021

0.1.1

Mar 6, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textdescriptives-2.8.4.tar.gz (1.6 MB view details)

Uploaded Dec 16, 2024 Source

Built Distribution

textdescriptives-2.8.4-py3-none-any.whl (254.3 kB view details)

Uploaded Dec 16, 2024 Python 3

File details

Details for the file textdescriptives-2.8.4.tar.gz.

File metadata

Download URL: textdescriptives-2.8.4.tar.gz
Upload date: Dec 16, 2024
Size: 1.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for textdescriptives-2.8.4.tar.gz
Algorithm	Hash digest
SHA256	`872baf8d3ee88d3f0cd076003a265b4788215126b5dcbeb6f7d439a50df40ba1`
MD5	`a28fd921989d4f1aa8ca34cdb6ec8e2a`
BLAKE2b-256	`d1ce2b3174f2105e4b16639fc7d2af000c3b4abc3c09f0b2a7f250b3165fbb40`

See more details on using hashes here.

File details

Details for the file textdescriptives-2.8.4-py3-none-any.whl.

File metadata

Download URL: textdescriptives-2.8.4-py3-none-any.whl
Upload date: Dec 16, 2024
Size: 254.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for textdescriptives-2.8.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`de97af21c622918196523f0692059c0680a8afe0ca4a63d41bd0789c69899d78`
MD5	`fd904e9a8bd1903747f7f17f75848b1b`
BLAKE2b-256	`09f168407ea9f6451d76aac115b81a4199f10b79c4385399e892dfddc7397875`

See more details on using hashes here.

textdescriptives 2.8.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TextDescriptives

🔧 Installation

📰 News

⚡ Quick Start

Usage with spaCy

📖 Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes