Simple, Pythonic text processing. Sentiment analysis, POS tagging, noun phrase parsing, and more.
TextBlob: Simplified Text Processing
TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
from text.blob import TextBlob text = ''' The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be damned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant. ''' blob = TextBlob(text) blob.tags # [(u'The', u'DT'), (u'titular', u'JJ'), # (u'threat', u'NN'), (u'of', u'IN'), ...] blob.noun_phrases # WordList(['titular threat', 'blob', # 'ultimate movie monster', # 'amoeba-like mass', ...]) for sentence in blob.sentences: print(sentence.sentiment) # returns (polarity, subjectivity) # (0.060, 0.605) # (-0.341, 0.767) blob.translate(to="es") # 'La amenaza titular de The Blob...'
- Noun phrase extraction
- Part-of-speech tagging
- Sentiment analysis
- Classification (Naive Bayes, Decision Tree)
- Language translation and detection powered by Google Translate
- Tokenization (splitting text into words and sentences)
- Word and phrase frequencies
- Word inflection (pluralization and singularization) and lemmatization
- Spelling correction
- JSON serialization
- Easily swap models, or create your own
Get it now
$ pip install -U textblob $ curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | python
See more examples at the Quickstart guide.
Full documentation is available at https://textblob.readthedocs.org/.
- Python >= 2.6 or >= 3.3
MIT licensed. See the bundled LICENSE file for more details.
- Word tokenization fix: Words that stem from a contraction will still have an apostrophe, e.g. "Let's" => ["Let", "'s"].
- Fix bug with comparing blobs to strings.
- Add text.taggers.PerceptronTagger, a fast and accurate POS tagger. Thanks @syllog1sm.
- Note for Python 3 users: You may need to update your corpora, since NLTK master has reorganized its corpus system. Just run curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | python again.
- Add download_corpora_lite.py script for getting the minimum corpora requirements for TextBlob’s basic features.
- Fix bug that resulted in a UnicodeEncodeError when tagging text with non-ascii characters.
- Add DecisionTreeClassifier.
- Add labels() and train() methods to classifiers.
- Classifiers can be trained and tested on CSV, JSON, or TSV data.
- Add basic WordNet lemmatization via the Word.lemma property.
- WordList.pluralize() and WordList.singularize() methods return WordList objects.
- Add Naive Bayes classification. New text.classifiers module, TextBlob.classify(), and Sentence.classify() methods.
- Add parsing functionality via the TextBlob.parse() method. The text.parsers module currently has one implementation (PatternParser).
- Add spelling correction. This includes the TextBlob.correct() and Word.spellcheck() methods.
- Update NLTK.
- Backwards incompatible: clean_html has been deprecated, just as it has in NLTK. Use Beautiful Soup’s soup.get_text() method for HTML-cleaning instead.
- Slight API change to language translation: if from_lang isn’t specified, attempts to detect the language.
- Add itokenize() method to tokenizers that returns a generator instead of a list of tokens.
- Unicode fixes: This fixes a bug that sometimes raised a UnicodeEncodeError upon creating accessing sentences for TextBlobs with non-ascii characters.
- Update NLTK
- Important patch update for NLTK users: Fix bug with importing TextBlob if local NLTK is installed.
- Fix bug with computing start and end indices of sentences.
- Fix bug that disallowed display of non-ascii characters in the Python REPL.
- Backwards incompatible: Restore blob.json property for backwards compatibility with textblob<=0.3.10. Add a to_json() method that takes the same arguments as json.dumps.
- Add WordList.append and WordList.extend methods that append Word objects.
- Language translation and detection API!
- Add text.sentiments module. Contains the PatternAnalyzer (default implementation) as well as a NaiveBayesAnalyzer.
- Part-of-speech tags can be accessed via TextBlob.tags or TextBlob.pos_tags.
- Add polarity and subjectivity helper properties.
- New text.tokenizers module with WordTokenizer and SentenceTokenizer. Tokenizer instances (from either textblob itself or NLTK) can be passed to TextBlob’s constructor. Tokens are accessed through the new tokens property.
- New Blobber class for creating TextBlobs that share the same tagger, tokenizer, and np_extractor.
- Add ngrams method.
- Backwards-incompatible: TextBlob.json() is now a method, not a property. This allows you to pass arguments (the same that you would pass to json.dumps()).
- New home for documentation: https://textblob.readthedocs.org/
- Add parameter for cleaning HTML markup from text.
- Minor improvement to word tokenization.
- Updated NLTK.
- Fix bug with adding blobs to bytestrings.
- Bundled NLTK no longer overrides local installation.
- Fix sentiment analysis of text with non-ascii characters.
- Updated nltk.
- ConllExtractor is now Python 3-compatible.
- Improved sentiment analysis.
- Blobs are equal (with ==) to their string counterparts.
- Added instructions to install textblob without nltk bundled.
- Dropping official 3.1 and 3.2 support.
- Importing TextBlob is now much faster. This is because the noun phrase parsers are trained only on the first call to noun_phrases (instead of training them every time you import TextBlob).
- Add text.taggers module which allows user to change which POS tagger implementation to use. Currently supports PatternTagger and NLTKTagger (NLTKTagger only works with Python 2).
- NPExtractor and Tagger objects can be passed to TextBlob’s constructor.
- Fix bug with POS-tagger not tagging one-letter words.
- Rename text/np_extractor.py -> text/np_extractors.py
- Add run_tests.py script.
- Every word in a Blob or Sentence is a Word instance which has methods for inflection, e.g word.pluralize() and word.singularize().
- Updated the np_extractor module. Now has an new implementation, ConllExtractor that uses the Conll2000 chunking corpus. Only works on Py2.
Release history Release notifications
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size textblob-0.6.3-py2.py3-none-any.whl (1.5 MB)||File type Wheel||Python version 2.7||Upload date||Hashes View hashes|
|Filename, size textblob-0.6.3.tar.gz (1.8 MB)||File type Source||Python version None||Upload date||Hashes View hashes|
Hashes for textblob-0.6.3-py2.py3-none-any.whl