An off-the-rack NLP sentiment classifier- upload your own corpus or use the pre-installed ones
Project description
# empathyMachines
> A standalone NLP sentiment classifier you can import as a module
## Purposes
1. Offer a batteries-included NLP classifier you can use either on it's own, or to make sentiment predictions as part of a broder NLP project (for example, when classifying customer messages, whether the customer is angry or not might help you determine if this is a compensation request, or a request to adjust their address.)
1. Have the entire sentiment prediction process scaffolded so you can feed in your own training corpus, and easily train an NLP sentiment classifier.
## How to use
1. `pip install empythy`
1.
```
from empythy import EmpathyMachines
nlp_classifier = EmpathyMachines()
nlp_classifier.train()
nlp_classifier.predict(text_string)
```
### Corpora included
#### NLTK Movie Reviews
The classic sentiment corpus, 2000 movie reviews already gathered by NLTK.
#### Assembling a custom Twitter sentiment corpus
[CrowdFlower](http://www.crowdflower.com/data-for-everyone) hosts a number of Twitter corpora that have already been graded for sentiment by panels of humans.
I aggregated together 6 of their corpora into a single, aggregated and cleaned corpus, with consistent scoring labels across the entire corpus. The cleaned corpus contains over 45,000 documents, with positive, negative, and neutral sentiments.
### Train on your own corpus
Feel free to train a classifier on your own corpus!
Two ways to do this
1. Read in a .csv file with header row containing "sentiment", "text", and optionally, "confidence"
- Pass the name of the .csv file to train, like so:
- `nlp_classifier.train(corpus='custom', corpus_path='path/to/custom/corpus.csv')`
1. Pass in an array of Python dictionaries, where each dictionary has attributes for "sentiment", "text", and optionally, "confidence"
- `nlp_classifier.train(corpus='custom', corpus_array=my_array_of_texts)`
- Two important parts to this, both `corpus='custom'`, and `corpus_array=my_variable_holding_the_documents`.
### Advanced Usage
1. `nlp_classifier.train(verbose=False)` to turn off print status statements while training.
1. `nlp_classifier.train(print_analytics_results=True)` to print out results of training the classifier.
> A standalone NLP sentiment classifier you can import as a module
## Purposes
1. Offer a batteries-included NLP classifier you can use either on it's own, or to make sentiment predictions as part of a broder NLP project (for example, when classifying customer messages, whether the customer is angry or not might help you determine if this is a compensation request, or a request to adjust their address.)
1. Have the entire sentiment prediction process scaffolded so you can feed in your own training corpus, and easily train an NLP sentiment classifier.
## How to use
1. `pip install empythy`
1.
```
from empythy import EmpathyMachines
nlp_classifier = EmpathyMachines()
nlp_classifier.train()
nlp_classifier.predict(text_string)
```
### Corpora included
#### NLTK Movie Reviews
The classic sentiment corpus, 2000 movie reviews already gathered by NLTK.
#### Assembling a custom Twitter sentiment corpus
[CrowdFlower](http://www.crowdflower.com/data-for-everyone) hosts a number of Twitter corpora that have already been graded for sentiment by panels of humans.
I aggregated together 6 of their corpora into a single, aggregated and cleaned corpus, with consistent scoring labels across the entire corpus. The cleaned corpus contains over 45,000 documents, with positive, negative, and neutral sentiments.
### Train on your own corpus
Feel free to train a classifier on your own corpus!
Two ways to do this
1. Read in a .csv file with header row containing "sentiment", "text", and optionally, "confidence"
- Pass the name of the .csv file to train, like so:
- `nlp_classifier.train(corpus='custom', corpus_path='path/to/custom/corpus.csv')`
1. Pass in an array of Python dictionaries, where each dictionary has attributes for "sentiment", "text", and optionally, "confidence"
- `nlp_classifier.train(corpus='custom', corpus_array=my_array_of_texts)`
- Two important parts to this, both `corpus='custom'`, and `corpus_array=my_variable_holding_the_documents`.
### Advanced Usage
1. `nlp_classifier.train(verbose=False)` to turn off print status statements while training.
1. `nlp_classifier.train(print_analytics_results=True)` to print out results of training the classifier.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
empythy-1.0.0.tar.gz
(1.6 MB
view details)
Built Distribution
File details
Details for the file empythy-1.0.0.tar.gz
.
File metadata
- Download URL: empythy-1.0.0.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e14a61019264ea2b5e3a8707a0e08b24178e67c89f981c6f26a9677a199547a |
|
MD5 | b2504b701c4ba9f0304b6badcb5cee11 |
|
BLAKE2b-256 | 2c6cd29c7026e45827fff39a1bc3726ff35521e760d2663daf0c33535e5ad169 |
File details
Details for the file empythy-1.0.0-py2.py3-none-any.whl
.
File metadata
- Download URL: empythy-1.0.0-py2.py3-none-any.whl
- Upload date:
- Size: 1.6 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a98d9ebd00ab2a605f42b20a2c777e3b1e84af831239bb04467a80bf3c7c21d6 |
|
MD5 | 388782715c35baae4c8a95cb1467631a |
|
BLAKE2b-256 | 50eee771b8a40f56301403640919db735dc1f3630dd30e901bf68dc27bd062e7 |