Simple, Pythonic text processing. Sentiment analysis, POS tagging, noun phrase parsing, and more.
TextBlob: Simplified Text Processing
TextBlob is a Python (2 and 3) library for processing textual data. It provides a consistent API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more.
from text.blob import TextBlob text = ''' The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be damned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant. ''' blob = TextBlob(text) blob.pos_tags # [(u'The', u'DT'), (u'titular', u'JJ'), # (u'threat', u'NN'), (u'of', u'IN'), ...] blob.noun_phrases # WordList(['titular threat', 'blob', # 'ultimate movie monster', # 'amoeba-like mass', ...]) for sentence in blob.sentences: print(blob.sentiment) # returns (sentiment, subjectivity) # (0.060, 0.605) # (-0.34, 0.77)
Get it now
$ pip install -U textblob $ curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | python
See more examples at the quickstart guide.
Hosted here at ReadTheDocs.
- Python >= 2.6 or >= 3.3
to run all tests.
TextBlob is licenced under the MIT license. See the bundled LICENSE file for more details.
- New text.tokenizers module with WordTokenizer and SentenceTokenizer. Tokenizer instances (from either textblob itself or NLTK) can be passed to TextBlob’s constructor. Tokens are accessed through the new tokens property.
- New Blobber class for creating TextBlobs that share the same tagger, tokenizer, and np_extractor.
- Add ngrams method.
- Backwards-incompatible: TextBlob.json() is now a method, not a property. This allows you to pass arguments (the same that you would pass to json.dumps()).
- New home for documentation: https://textblob.readthedocs.org/
- Add parameter for cleaning HTML markup from text.
- Minor improvement to word tokenization.
- Updated NLTK.
- Fix bug with adding blobs to bytestrings.
- Bundled NLTK no longer overrides local installation.
- Fix sentiment analysis of text with non-ascii characters.
- Updated nltk.
- ConllExtractor is now Python 3-compatible.
- Improved sentiment analysis.
- Blobs are equal (with ==) to their string counterparts.
- Added instructions to install textblob without nltk bundled.
- Dropping official 3.1 and 3.2 support.
- Importing TextBlob is now much faster. This is because the noun phrase parsers are trained only on the first call to noun_phrases (instead of training them every time you import TextBlob).
- Add text.taggers module which allows user to change which POS tagger implementation to use. Currently supports PatternTagger and NLTKTagger (NLTKTagger only works with Python 2).
- NPExtractor and Tagger objects can be passed to TextBlob’s constructor.
- Fix bug with POS-tagger not tagging one-letter words.
- Rename text/np_extractor.py -> text/np_extractors.py
- Add run_tests.py script.
- Every word in a Blob or Sentence is a Word instance which has methods for inflection, e.g word.pluralize() and word.singularize().
- Updated the np_extractor module. Now has an new implementation, ConllExtractor that uses the Conll2000 chunking corpus. Only works on Py2.
Release history Release notifications
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size textblob-0.4.0-py2.py3-none-any.whl (1.4 MB)||File type Wheel||Python version 2.7||Upload date||Hashes View hashes|
|Filename, size textblob-0.4.0.tar.gz (1.7 MB)||File type Source||Python version None||Upload date||Hashes View hashes|
Hashes for textblob-0.4.0-py2.py3-none-any.whl