Skip to main content

Dendrogram Prototypical Discourse generator

Project description

Dendrogram Prototypical Discourse Analysis

According to [Harris, 1954] and [Rubenstein and Goodenough, 1965], words in natural languages are structured within linguistic environments (e.g.,sentences, paragraphs), and in this context, words having similar meanings, tend to share similar contexts. This assumption, known as the Distributional Hypothesis, suggests that a corpus is often constituted bys everal discursive contexts; each one being a set of extended linguistic environments, conveying similar/related concepts and topics. Although this theory emerged in linguistics in 1954, it received recently an in-creasing attention in many other fields such as in cognitive sciences (e.g.,[McDonald and Ramscar, 2001]), and natural language processing (e.g.,[Mikolov et al., 2013a]). This hypothesis is the founding principle of our approach. Our method aims at modeling a large corpus, as a set of so-called DP-discourses, and then studying them as prototypical speeches. To do so, the core step, consists in building clusters of words sharing similar dis-cursive contexts. This was achieved using word-embedding and subspace clustering, but other data-mining techniques could be used. Then, intra-cluster words were represented asDendrogram Prototypical Discourses(DP-discourses), using a hierarchical clustering algorithm. Finally, DP-discourses revealed to be comprehensible enough, to be studied using Charaudeau’s methodology, and they could possibly be analyzed using other discourse analysis approaches.


The easiest way to install the generator is using pip the package installer for Python. Typing the command:

pip install DPD


Check the jupyter notebook tutorial tutorials/tutorial1.ipynb for a basic usage illustration


This project is under the GNU GENERAL PUBLIC LICENSE (Version 3, 29 June 2007)

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for DPD, version 0.0.3
Filename, size File type Python version Upload date Hashes
Filename, size DPD-0.0.3.tar.gz (4.2 kB) File type Source Python version None Upload date Hashes View
Filename, size DPD-0.0.3-py3-none-any.whl (16.9 kB) File type Wheel Python version py3 Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page