Understand the features of language for different kinds of corpora
### Last Updated: 7/1/2018 # Adaptationism is defined as…
_”The belief or assumption, now generally held, that each feature of an organism is the result of evolutionary adaptation for a particular function.”_
Our use of language is no different. We are constantly evolving our own group-specific languages (as well as learning the languages of others - both groups and individuals), the catalyst of development rooted in the functions that they serve.
- Like any scholarly endeavor, the wikipedia article that I pulled this information from defines the traits of an adaptation include:
- The trait is a variation of an earlier form.
- The trait is heritable through the transmission of genes.
- The trait enhances reproductive success.
- The last metaphorical comparison that I’ll make to this idea is that like all of the above, language (and features of language):
- Can be derived from variations of a broader parent language,
- Are heritable through the groups, communities, and experiences we partake in, and…
- The feature enhances the success of achieving a descriptive and/or actionable outcome through language.
## So what is this package _actually_ for…?*
- “Adaptationism” is meant to help answer the three following questions:
- Uses language to describe features of a journey or experience,
- How the use of language differs against another group, a broader (parent) group, or a subset of the same group, and finally,
- Changes its use of language following some event.
The gap that I hope to fill through the development of this package is not just a set of tools, but also a framework for interpretation and action.
## What is your roadmap?
- My roadmap for this package is structured as a pyramid (3 levels), flowing from:
- (level 1) descriptive features of words and phrases, to…
- (level 2) the analysis and description of meta-language related features (i.e. POS, polarity, patterns of POS, named entities… etc.), and finally…
- (level 3) the descriptive statistics of text (word length, statement length, corpus length, average length… etc.).
- While I primarily spend my time analyzing comments… the types of [corpora](https://wiki.apache.org/spamassassin/PluralOfCorpus) this package can analyze includes:
- Chat / Text Conversations
For a full list of different kinds of corpora, [check this out](https://weblearn.ox.ac.uk/access/content/group/3a217dfd-a8cd-4034-8564-c27a58f89b9b/Handouts/CorpusTypes.pdf).
## How did I come up with this idea?
This package has a few origins… mostly stemming from unanswered Stack Overflow posts on NLTK. As I continue to add to my own roadmap, I will add the sets of SO posts that I draw inspiration from.
- Current list of TA questions:
- [Generating N-Gram Markov Chain Transition Table](https://stackoverflow.com/questions/23374694/n-gram-markov-chain-transition-table)
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.