Skip to main content

Understand the features of language for different kinds of corpora

Project description

### Last Updated: 7/1/2018 # Adaptationism is defined as…

_”The belief or assumption, now generally held, that each feature of an organism is the result of evolutionary adaptation for a particular function.”_

Our use of language is no different. We are constantly evolving our own group-specific languages (as well as learning the languages of others - both groups and individuals), the catalyst of development rooted in the functions that they serve.

Like any scholarly endeavor, the wikipedia article that I pulled this information from defines the traits of an adaptation include:
  1. The trait is a variation of an earlier form.

  2. The trait is heritable through the transmission of genes.

  3. The trait enhances reproductive success.

The last metaphorical comparison that I’ll make to this idea is that like all of the above, language (and features of language):
  1. Can be derived from variations of a broader parent language,

  2. Are heritable through the groups, communities, and experiences we partake in, and…

  3. The feature enhances the success of achieving a descriptive and/or actionable outcome through language.

## So what is this package _actually_ for…?*

“Adaptationism” is meant to help answer the three following questions:
  1. Uses language to describe features of a journey or experience,

  2. How the use of language differs against another group, a broader (parent) group, or a subset of the same group, and finally,

  3. Changes its use of language following some event.

The gap that I hope to fill through the development of this package is not just a set of tools, but also a framework for interpretation and action.

## What is your roadmap?

My roadmap for this package is structured as a pyramid (3 levels), flowing from:
  • (level 1) descriptive features of words and phrases, to…

  • (level 2) the analysis and description of meta-language related features (i.e. POS, polarity, patterns of POS, named entities… etc.), and finally…

  • (level 3) the descriptive statistics of text (word length, statement length, corpus length, average length… etc.).

While I primarily spend my time analyzing comments… the types of [corpora](https://wiki.apache.org/spamassassin/PluralOfCorpus) this package can analyze includes:
  1. Comments

  2. Chat / Text Conversations

  3. Books

  4. Speeches

For a full list of different kinds of corpora, [check this out](https://weblearn.ox.ac.uk/access/content/group/3a217dfd-a8cd-4034-8564-c27a58f89b9b/Handouts/CorpusTypes.pdf).

## How did I come up with this idea?

This package has a few origins… mostly stemming from unanswered Stack Overflow posts on NLTK. As I continue to add to my own roadmap, I will add the sets of SO posts that I draw inspiration from.

Current list of TA questions:
  1. [Generating N-Gram Markov Chain Transition Table](https://stackoverflow.com/questions/23374694/n-gram-markov-chain-transition-table)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adaptationism-0.0.5.tar.gz (3.6 kB view details)

Uploaded Source

File details

Details for the file adaptationism-0.0.5.tar.gz.

File metadata

File hashes

Hashes for adaptationism-0.0.5.tar.gz
Algorithm Hash digest
SHA256 f32f7be9adaf5449b689c2fbac2fc6a31f9d71a908f0ed228bd6ec28f29c082a
MD5 1b69dc96b1f30724c97b38f5d9cbe52d
BLAKE2b-256 6dbc45db5dd6836918101f2d9937ffdb2788856c63b354e9152f88978b763f5d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page