Generates biased stop word lists for various genres
Project description
Overview
Stop words are words which are filtered out before processing of natural language data. Often in text analysis there are non-casual correlations, consider the following documents:
He is an astronaut, he is on Venus
He is an accountant, he is on Earth
She is an astronaut, she is on Mars
Processing these documents into two topics will result in gendered clustering. If we remove the gendered terms:
is an astronaut, is on Venus
is an accountant, is on Earth
is an astronaut, is on Mars
Processing will result in job clustering. Both clusterings are valid, however if you are interested in employing an astronaut, you don’t want male accountants showing up.
Available genres
English Gendered Terms
US names
More will be available soon. Contribute at https://github.com/gregology/biased-stop-words
Installation
biased-stop-words is available on PyPI
http://pypi.python.org/pypi/biased-stop-words
Install via pip Note: currenlty broken
$ pip install biased-stop-words
Or by cloning biased-stop-words’s git repo
$ git clone --recursive git://github.com/gregology/python-biased-stop-words.git
Then install it by running:
$ python setup.py install
Basic usage
from biased_stop_words import get_stop_words stop_words = get_stop_words('gendered', 'common-us-names')
Running Test
$ python biased_stop_words/tests.py
Python compatibility
Developed for Python 2 & 3.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for biased-stop-words-2017.5.12.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06beb775521c7b4e653cc9fe2087c760cf8285e90530119e6063d03a540458a6 |
|
MD5 | e8f9488967910911c01bf45487dada24 |
|
BLAKE2b-256 | 6ac1a97c5910dab3791c965b74983e962b45e4f936a99f71b3c7aadec8e25611 |
Hashes for biased_stop_words-2017.5.12.1-py2.7.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | cbfd73cb06bf1d41149746421a3d021397c383b7062cea163f53565117ba5e5b |
|
MD5 | a04b2d7692c9acf9a59905230dde73c3 |
|
BLAKE2b-256 | 199b162dc36cdfff4de7f9931dbe7863853c0cc353f31bcc94638067a5abbab0 |
Hashes for biased_stop_words-2017.5.12.1-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08f6ecf32f6567583407644f647dc75fde63cfef9714798af5cb836ae87d9cb4 |
|
MD5 | bea712f1b9beed3e2e0ec327c59bbf0b |
|
BLAKE2b-256 | e6136925949df188508233783ddc3965fc0ed8e8fad5a12283dd85f1de5c2091 |