TWSS: A Naive Bayes classifier that can identify double entendres.
This is an implementation of a simple double entendre classifier in Python.
This currently uses a Naive Bayes classifier (the NLTK implementation) as a Python package. This was inspired by the bvandenvos Ruby TWSS project and uses the same data corpus.
This was built on the eve of Barcamp Mumbai 8 and presented during a session there.
Suggestions welcome. Do file bugs. Fork away. Send us pull requests.
$ virtualenv --no-site-packages --distribute venv $ source venv/bin/activate $ pip install -r requirements.txt
This creates a virtual environment for this project and install all the packages necessary for the project to work.
Once this is installed, you can take it out for a spin:
>>> from twss import TWSS >>> twss = TWSS() >>> twss("That was hard") True >>> twss("Hello world") False
The first call can take a while- the module needs to train the classifier against the pre-installed training dataset.
You can supply your own training data using positive and negative corpus files:
>>> twss = TWSS(positive_corpus_file=open('foo.txt'), negative_corpus_file=open('bar.txt'))
or directly, as a list of tuples:
>>> training_data = [ ... ("Sentence 1", True), ... ("Sentence 2", False), ... ... ] >>> twss = TWSS(training_data)