Skip to main content

Generates biased stop word lists for various genres

Project description

https://badge.fury.io/py/biased-stop-words.svg http://img.shields.io/badge/license-MIT-yellow.svg?style=flat https://img.shields.io/badge/contact-Gregology-blue.svg?style=flat

Overview

Biases are bugs

Stop words are words which are filtered out before processing of natural language data. Often in text analysis there are non-casual correlations, consider the following documents:

  • He is an astronaut, he is on Venus

  • He is an accountant, he is on Earth

  • She is an astronaut, she is on Mars

Processing these documents into two topics will result in gendered clustering. If we remove the gendered terms:

  • is an astronaut, is on Venus

  • is an accountant, is on Earth

  • is an astronaut, is on Mars

Processing will result in job clustering. Both clusterings are valid, however if you are interested in employing an astronaut, you don’t want male accountants showing up. There are many other examples of non casual relationships occurring in natural language; religion, ethnicity, and age to name but a few.

Available genres

  • Gendered Terms

  • US Names

  • Religious Terms (Partial)

More will be available soon. Contribute at https://github.com/gregology/biased-words

Interactive Notebook

Explore this package in an Interactive Notebook

https://user-images.githubusercontent.com/1595448/48975588-00661d00-f042-11e8-97c6-ded19ad45f51.png

Hosted by binder

Installation

biased-stop-words is available on PyPI

http://pypi.python.org/pypi/biased-stop-words

Install via pip

$ pip install biased-stop-words

Or via easy_install

$ easy_install biased-stop-words

Or directly from biased-stop-words’s git repo <https://github.com/gregology/biased-words>

$ git clone --recursive git://github.com/gregology/biased-stop-words.git
$ cd biased-stop-words
$ python setup.py install

Basic usage

>>> from biased_stop_words import genres, get_stop_words
>>> genres()
'religious, gendered, us-common-names, us-names, us-male-names, us-female-names, gendered-nouns'
>>> get_stop_words('gendered', 'us-common-names')
[u'trenton', u'augustine', u'khalil', u'aiden', u'elisabeth', u'andre', u'khanum', u'elva', u'fran...

Running Test

$ python biased_stop_words/tests.py

Python compatibility

Developed for Python 2 & 3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biased-stop-words-2018.11.29.0.tar.gz (41.5 kB view details)

Uploaded Source

File details

Details for the file biased-stop-words-2018.11.29.0.tar.gz.

File metadata

  • Download URL: biased-stop-words-2018.11.29.0.tar.gz
  • Upload date:
  • Size: 41.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.0

File hashes

Hashes for biased-stop-words-2018.11.29.0.tar.gz
Algorithm Hash digest
SHA256 401e4e53004be07395d07a00f738d68711a121e811106cbd9a3cebf570fa70dc
MD5 35604875f423d341807c749a73ebded9
BLAKE2b-256 c76a35e8bc81cca66806c47e5f62e9151ca72ec8e58b6b065428472b35a3dc11

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page