Understand the features of language for different kinds of corpora
Project description
Last Updated: 7/1/2018
Adaptationism is defined as...
"The belief or assumption, now generally held, that each feature of an organism is the result of evolutionary adaptation for a particular function."
Our use of language is no different. We are constantly evolving our own group-specific languages (as well as learning the languages of others - both groups and individuals), the catalyst of development rooted in the functions that they serve.
Like any scholarly endeavor, the wikipedia article that I pulled this information from defines the traits of an adaptation include:
- The trait is a variation of an earlier form.
- The trait is heritable through the transmission of genes.
- The trait enhances reproductive success.
The last metaphorical comparison that I'll make to this idea is that like all of the above, language (and features of language):
- Can be derived from variations of a broader parent language,
- Are heritable through the groups, communities, and experiences we partake in, and...
- The feature enhances the success of achieving a descriptive and/or actionable outcome through language.
So what is this package actually for...?*
"Adaptationism" is meant to help answer the three following questions:
- Uses language to describe features of a journey or experience,
- How the use of language differs against another group, a broader (parent) group, or a subset of the same group, and finally,
- Changes its use of language following some event.
The gap that I hope to fill through the development of this package is not just a set of tools, but also a framework for interpretation and action.
What is your roadmap?
My roadmap for this package is structured as a pyramid (3 levels), flowing from:
- (level 1) descriptive features of words and phrases, to...
- (level 2) the analysis and description of meta-language related features (i.e. POS, polarity, patterns of POS, named entities... etc.), and finally...
- (level 3) the descriptive statistics of text (word length, statement length, corpus length, average length... etc.).
While I primarily spend my time analyzing comments... the types of corpora this package can analyze includes:
- Comments
- Chat / Text Conversations
- Books
- Speeches
For a full list of different kinds of corpora, check this out.
How did I come up with this idea?
This package has a few origins... mostly stemming from unanswered Stack Overflow posts on NLTK. As I continue to add to my own roadmap, I will add the sets of SO posts that I draw inspiration from.
Current list of TA questions:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.