Python library for computing propositional idea density

These details have not been verified by PyPI

Project links

Project description

pycpidr

Python library for computing propositional idea density.

Introduction
What is Idea Density?
Installation
Usage
- CPIDR
- DEPID
Requirements
Development Setup
Running Tests
CPIDR Parity with CPIDR 3.2
References
Citing
Contributing
License

Introduction

pycpidr is a Python library which determines the propositional idea density of an English text automatically. This project aims to make this functionality more accessible to Python developers and researchers. pycpidr provides two ways of computing idea density:

CPIDR. The CPIDR implementation in pycpidr is a direct port of the Computerized Propositional Idea Density Rater (CPIDR) 3.2 (Brown et al 2008)[^1]
DEPID. This library implements the DEPID algorithm described by Sirts et al (2017)[^2]

Here's a quick example of how to use pycpidr:

from pycpidr import cpidr, depid

text = "The quick brown fox jumps over the lazy dog."
cpidr_word_count, proposition_count, cpidr_density, word_list = cpidr(text)
depid_density, depid_word_count, dependencies = depid(text)
print(f"CPIDR density: {cpidr_density}")
print(f"DEPID density: {depid_density}")

What is Idea Density?

Idea density, also known as propositional density, is a measure of the amount of information conveyed relative to the number of words used. It's calculated by dividing the number of expressed propositions by the number of words. This metric has applications in various fields, including linguistics, cognitive science, and healthcare research.

Installation

Using pip

Install the package

pip install pycpidr

Download the required spaCy model:

python -m spacy download en_core_web_sm

Using poetry

poetry add pycpidr
python -m spacy download en_core_web_sm

Usage

CPIDR

Here's a simple example of how to use PyCPIDR:

from pycpidr import cpidr

text = "The quick brown fox jumps over the lazy dog."
word_count, proposition_count, density, word_list = cpidr(text)

print(f"Word count: {word_count}")
print(f"Proposition count: {proposition_count}")
print(f"Idea density: {density:.3f}")

# Analyzing speech
speech_text = "Um, you know, I think that, like, the weather is nice today."
word_count, proposition_count, density, word_list = cpidr(speech_text, speech_mode=True)

print(f"Speech mode - Idea density: {density:.3f}")

# Detailed word analysis
for word in word_list.items:
    if word.is_word:
        print(f"Token: {word.token}, Tag: {word.tag}, Is proposition: {word.is_proposition}")

Speech Mode

PyCPIDR CPIDR mode supports a speech mode that handles common speech patterns and fillers differently from written text. When analyzing transcripts or spoken language, use the speech_mode=True parameter for more accurate results.

DEPID

Here's an example of how to use the DEPID functionality:

from pycpidr import depid

text = "The quick brown fox jumps over the lazy dog."
density, word_count, dependencies = depid(text)
print(f"Word count: {word_count}")
print(f"Idea density: {density:.3f}")
print("Dependencies:")
for dep in dependencies:
    print(f"Token: {dep[0]}, Dependency: {dep[1]}, Head: {dep[2]}")

Using custom filters

pycpidr DEPID mode supports custom filtering of sentences and tokens. By default, pycpidr uses filters described by Sirts et al (2017):

Sentence filter.
- Filter out sentences with "I" or "You" as the subject of the sentence (i.e. if the "I" or "You" token dependency is "nsubj" and it's head dependency is the root).
- Note: Sirts et al (2017) also filters out vague sentences using SpeciTeller. That is a filter which pycpidr does not yet implement.
Token filters:
- Filter out "det" dependencies if the token is "a", "an" or "the".
- Filter out "nsubj" dependencies if the token is "it" or "this".
- Filter out all "cc" dependencies.

This example demonstrates how to apply your own custom filters to modify the analysis. The sentence_filters and token_filters parameters allow you to customize the DEPID algorithm to suit your specific needs.

def custom_sentence_filter(sent):
    return len(sent) > 3
def custom_token_filter(token):
    return token.pos_ != "DET"
text_with_filters = "I run. The quick brown fox jumps over the lazy dog."
density, word_count, dependencies = depid(text_with_filters,
sentence_filters=[custom_sentence_filter],
token_filters=[custom_token_filter])
print(f"\nWith custom filters - Idea density: {density:.3f}")

Requirements

Python 3.10+
spaCy 3.7.5+

Development Setup

To set up the development environment:

Clone the repository
Install Poetry if you haven't already: pip install poetry
Install project dependencies: poetry install
Install the required spaCy model: poetry run python -m spacy download en_core_web_sm
Activate the virtual environment: poetry shell

Running Tests

To run the tests, use pytest:

pytest tests/

CPIDR Parity with CPIDR 3.2

Because this port uses spaCy as a part-of-speech tagger instead of the original program's MontyLingua, there is a very slight difference in the reported idea density. This port includes unit tests containing 847 words of text. This project: 434 propositions. 0.512 idea density CPIDR 3.2: 436 propositions. 0.515 idea density

For more information about the original CPIDR 3.2, please visit CASPR's official page.

References

[^1]: Brown, C., Snodgrass, T., Kemper, S. J., Herman, R., & Covington, M. A. (2008). Automatic measurement of propositional idea density from part-of-speech tagging. Behavior research methods, 40(2), 540-545. [^2]: Sirts, K., Piguet, O., & Johnson, M. (2017). Idea density for predicting Alzheimer's disease from transcribed speech. arXiv preprint arXiv:1706.04473.

Citing

If you use this project in your research, you may cite is as: Jason Robison. (2024). pycpidr (0.2.0) [Source code]. GitHub. https://github.com/jrrobison1/pycpidr

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Please ensure that your code passes all tests and follows the project's coding style.

License

This project is licensed under the GNU General Public License v2.0. See the LICENSE file for details.

pycpidr's CPIDR implementation is a port of the original CPIDR 3.2, which was released under GPL v2. This project maintains the same license to comply with the terms of the original software.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.2

Feb 26, 2025

0.3.1

Feb 26, 2025

0.3.0

Aug 18, 2024

This version

0.2.0

Aug 18, 2024

0.1.0

Aug 15, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycpidr-0.2.0.tar.gz (25.3 kB view details)

Uploaded Aug 18, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pycpidr-0.2.0-py3-none-any.whl (25.5 kB view details)

Uploaded Aug 18, 2024 Python 3

File details

Details for the file pycpidr-0.2.0.tar.gz.

File metadata

Download URL: pycpidr-0.2.0.tar.gz
Upload date: Aug 18, 2024
Size: 25.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.10.14 Darwin/23.6.0

File hashes

Hashes for pycpidr-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`d5e0664596c85ccb895d2029a769300a7a18bacd520d46e43f980cf4f2d7ff51`
MD5	`0d1b87ddd3f7a98e440af380daa499bd`
BLAKE2b-256	`ada68439fb51caaea86f9f1a4c0aeddea17c4b42d01624dcc64f4dff4a767cdf`

See more details on using hashes here.

File details

Details for the file pycpidr-0.2.0-py3-none-any.whl.

File metadata

Download URL: pycpidr-0.2.0-py3-none-any.whl
Upload date: Aug 18, 2024
Size: 25.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.10.14 Darwin/23.6.0

File hashes

Hashes for pycpidr-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a2c8eb51d69056aad1d24d0db1eee0154d957b9dbc9adb6cd7406f0d475568f0`
MD5	`85bdc9e8bd6b8c69cdce03a5e01f1e45`
BLAKE2b-256	`d61f42dfc1e9655a47331542781253b1c47d5f43b798047a546a4873d9ec4059`

See more details on using hashes here.

pycpidr 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pycpidr

Table of Contents

Introduction

What is Idea Density?

Installation

Using pip

Using poetry

Usage

CPIDR

Speech Mode

DEPID

Using custom filters

Requirements

Development Setup

Running Tests

CPIDR Parity with CPIDR 3.2

References

Citing

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes