Skip to main content

An off-the-rack NLP sentiment classifier- upload your own corpus or use the pre-installed ones

Project description

# empathyMachines
> A standalone NLP sentiment classifier you can import as a module

## Purposes

1. Offer a batteries-included NLP classifier you can use either on it's own, or to make sentiment predictions as part of a broder NLP project (for example, when classifying customer messages, whether the customer is angry or not might help you determine if this is a compensation request, or a request to adjust their address.)
1. Have the entire sentiment prediction process scaffolded so you can feed in your own training corpus, and easily train an NLP sentiment classifier.

## How to use

1. `pip install empythy`
1.
```
from empythy import EmpathyMachines
nlp_classifier = EmpathyMachines()
nlp_classifier.train()
nlp_classifier.predict(text_string)
```

### Corpora included

#### NLTK Movie Reviews
The classic sentiment corpus, 2000 movie reviews already gathered by NLTK.

#### Assembling a custom Twitter sentiment corpus
[CrowdFlower](http://www.crowdflower.com/data-for-everyone) hosts a number of Twitter corpora that have already been graded for sentiment by panels of humans.

I aggregated together 6 of their corpora into a single, aggregated and cleaned corpus, with consistent scoring labels across the entire corpus. The cleaned corpus contains over 45,000 documents, with positive, negative, and neutral sentiments.


### Train on your own corpus

Feel free to train a classifier on your own corpus!

Two ways to do this

1. Read in a .csv file with header row containing "sentiment", "text", and optionally, "confidence"
- Pass the name of the .csv file to train, like so:
- `nlp_classifier.train(corpus='custom', corpus_path='path/to/custom/corpus.csv')`
1. Pass in an array of Python dictionaries, where each dictionary has attributes for "sentiment", "text", and optionally, "confidence"
- `nlp_classifier.train(corpus='custom', corpus_array=my_array_of_texts)`
- Two important parts to this, both `corpus='custom'`, and `corpus_array=my_variable_holding_the_documents`.

### Advanced Usage
1. `nlp_classifier.train(verbose=False)` to turn off print status statements while training.
1. `nlp_classifier.train(print_analytics_results=True)` to print out results of training the classifier.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

empythy-1.0.0.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

empythy-1.0.0-py2.py3-none-any.whl (1.6 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file empythy-1.0.0.tar.gz.

File metadata

  • Download URL: empythy-1.0.0.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for empythy-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9e14a61019264ea2b5e3a8707a0e08b24178e67c89f981c6f26a9677a199547a
MD5 b2504b701c4ba9f0304b6badcb5cee11
BLAKE2b-256 2c6cd29c7026e45827fff39a1bc3726ff35521e760d2663daf0c33535e5ad169

See more details on using hashes here.

File details

Details for the file empythy-1.0.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for empythy-1.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a98d9ebd00ab2a605f42b20a2c777e3b1e84af831239bb04467a80bf3c7c21d6
MD5 388782715c35baae4c8a95cb1467631a
BLAKE2b-256 50eee771b8a40f56301403640919db735dc1f3630dd30e901bf68dc27bd062e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page