A tool for quantitatively measuring discursive similarity between bodies of text.
Project description
Quantitative Discursive Analysis (QDA)
(C) 2019 Mark M. Bailey, PhD
About
Quantitative Discursive Analysis (QDA) converts bodies of text into graph objects built from noun phrases. Each noun or modifier becomes a vertex, and edges are determined by how nouns and modifiers are linked within phrases. The more central a noun is to the overall text content, the higher its centrality measure. This makes the graph representation more robust than simple keyword frequencies.
QDA compares discursive content by calculating resonance between two texts. Resonance is the cosine similarity of the betweenness-centrality vectors for the intersection of vertices in both texts. Values are normalized to [0, 1], where 0 indicates no overlap and 1 indicates perfect overlap.
Installation
pip install .
Dependencies
- Python 3.10+
networkxnumpytextblob
Important: TextBlob corpora required for default extractor
The default noun phrase extraction method (textblob) requires TextBlob/NLTK corpora.
python -m textblob.download_corpora
If corpora are unavailable, use the fallback extractor documented below.
Quickstart
Default extractor (textblob)
import QDA
text_a = "This is a string of text about politics and economics."
text_b = "This is a different string of text about music and art."
g1 = QDA.discursive_object(text_a) # noun_extractor='textblob' by default
g2 = QDA.discursive_object(text_b)
print(QDA.resonate(g1, g2))
Fallback extractor (simple, no corpora required)
import QDA
text_a = "This is a string of text about politics and economics."
text_b = "This is a different string of text about music and art."
g1 = QDA.discursive_object(text_a, noun_extractor="simple")
g2 = QDA.discursive_object(text_b, noun_extractor="simple")
print(QDA.resonate(g1, g2))
API summary
QDA.discursive_object(text, noun_extractor="textblob")QDA.resonate(g1, g2)QDA.resonate_as_series(G_list)QDA.resonate_as_matrix(G_list)QDA.discursive_community(G_list)
Development
Run tests with:
pytest
Notes and limitations
- Large texts may be slow because betweenness centrality is computationally expensive.
- Results depend on noun phrase extraction method (
textblobvssimple). simpleis a compatibility fallback and may produce different phrase quality than TextBlob.
Changelog
- 0.1.0
- Added Python 3.10+ packaging support and pytest test suite.
- Added explicit TextBlob missing-corpora error messaging and optional simple extractor.
- Added NetworkX compatibility shim for NumPy graph conversion.
- Improved performance of matrix resonance and graph construction helpers.
- Added GitHub Actions CI for Python 3.10/3.11/3.12.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qda-0.1.0.tar.gz.
File metadata
- Download URL: qda-0.1.0.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
447f5d930334328ea441425537627e51d9ec64f40fffbe0ae097b25d55b4cc28
|
|
| MD5 |
5ba80437d413aa1c46e710dd4f2e8912
|
|
| BLAKE2b-256 |
a4741edec2e617d75cfe7a26c173c6a640b3e7317892fcfc54cd21de41b9b703
|
File details
Details for the file qda-0.1.0-py3-none-any.whl.
File metadata
- Download URL: qda-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eec2cd7e09e15485f7ef962e4f23a2db5d6cc76706025abd6d886e1b59d11493
|
|
| MD5 |
e76034e218b463aa6f244aa2ceb1536e
|
|
| BLAKE2b-256 |
b3923c594360bfb4ef6fd41840b79d70cc0a57cbc55c580a2d0e1d4c707f2b8f
|