textblob·PyPI

Simple, Pythonic text processing. Sentiment analysis, POS tagging, noun phrase parsing, and more.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language
Topic
- Text Processing :: Linguistic

Project description

TextBlob

Simplified text processing for Python 2 and 3.

Requirements

Python >= 2.6 or >= 3.3

Installation

If you don’t have pip (you should), run this first: curl https://raw.github.com/pypa/pip/master/contrib/get-pip.py | python

Option 1

Choose this option if you:

Want a quick install.
Don’t have nltk currently installed, or don’t mind if your current installation is overriden by the latest version on the master branch. NOTE: You can also prevent the effects of this if you use textblob in a virtualenv.

pip install -U textblob
curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | python

This will install textblob and download the necessary NLTK corpora.

Option 2

Choose this option if you:

Don’t want your local nltk installation to be overridden.
Want to keep your nltk on the bleeding edge of development.

pip install -U git+https://github.com/nltk/nltk
pip install -U git+https://github.com/sloria/TextBlob.git@no-bundle
curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | python

This will install the latest NLTK from the master branch, install textblob from the no-bundle branch, and download the necessary corpora.

Usage

Simple.

Create a TextBlob

from text.blob import TextBlob

wikitext = '''
Python is a widely used general-purpose, high-level programming language.
Its design philosophy emphasizes code readability, and its syntax allows
programmers to express concepts in fewer lines of code than would be
possible in languages such as C.
'''

wiki = TextBlob(wikitext)

Part-of-speech tags and noun phrases…

...are just properties.

wiki.pos_tags       # [(Word('Python'), 'NNP'), (Word('is'), 'VBZ'),
                    #  (Word('a'), u'DT'), (Word('widely'), 'RB')...]

wiki.noun_phrases   # WordList(['python', 'design philosophy',  'code readability'])

Note: The first time you access noun_phrases might take a few seconds because the noun phrase chunker needs to be trained. Subsequent calls to noun_phrases will be quick, however, since all TextBlobs share the same instance of a noun phrase chunker.

Sentiment analysis

The sentiment property returns a tuple of the form (polarity, subjectivity) where polarity ranges from -1.0 to 1.0 and subjectivity ranges from 0.0 to 1.0.

testimonial = TextBlob("Textblob is amazingly simple to use. What great fun!")
testimonial.sentiment        # (0.4583333333333333, 0.4357142857142857)

Tokenization

zen = TextBlob("Beautiful is better than ugly. "
                "Explicit is better than implicit. "
                "Simple is better than complex.")

zen.words            # WordList(['Beautiful', 'is', 'better'...])

zen.sentences        # [Sentence('Beautiful is better than ugly.'),
                      #  Sentence('Explicit is better than implicit.'),
                      #  ...]

for sentence in zen.sentences:
    print(sentence.sentiment)

Words and inflection

Each word in TextBlob.words or Sentence.words is a Word object (a subclass of unicode) with useful methods, e.g. for word inflection.

sentence = TextBlob('Use 4 spaces per indentation level.')
sentence.words
# OUT: WordList(['Use', '4', 'spaces', 'per', 'indentation', 'level'])
sentence.words[2].singularize()
# OUT: 'space'
sentence.words[-1].pluralize()
# OUT: 'levels'

Get word and noun phrase frequencies

wiki.word_counts['its']   # 2 (not case-sensitive by default)
wiki.words.count('its')   # Same thing
wiki.words.count('its', case_sensitive=True)  # 1

wiki.noun_phrases.count('code readability')  # 1

TextBlobs are like Python strings!

zen[0:19]            # TextBlob("Beautiful is better")
zen.upper()          # TextBlob("BEAUTIFUL IS BETTER THAN UGLY...")
zen.find("Simple")   # 65

apple_blob = TextBlob('apples')
banana_blob = TextBlob('bananas')
apple_blob < banana_blob           # True
apple_blob + ' and ' + banana_blob # TextBlob('apples and bananas')
"{0} and {1}".format(apple_blob, banana_blob)  # 'apples and bananas'

Get start and end indices of sentences

Use sentence.start and sentence.end. This can be useful for sentence highlighting, for example.

for sentence in zen.sentences:
    print(sentence)  # Beautiful is better than ugly
    print("---- Starts at index {}, Ends at index {}"\
                .format(sentence.start, sentence.end))  # 0, 30

Get a JSON-serialized version of the blob

zen.json   # '[{"sentiment": [0.2166666666666667, ' '0.8333333333333334],
            # "stripped": "beautiful is better than ugly", '
            # '"noun_phrases": ["beautiful"], "raw": "Beautiful is better than ugly. ", '
            # '"end_index": 30, "start_index": 0}
            #  ...]'

Advanced usage

Noun Phrase Chunkers

TextBlob currently has two noun phrases chunker implementations, text.np_extractors.FastNPExtractor (default, based on Shlomi Babluki’s implementation from this blog post) and text.np_extractors.ConllExtractor, which uses the CoNLL 2000 corpus to train a tagger.

You can change the chunker implementation (or even use your own) by explicitly passing an instance of a noun phrase extractor to a TextBlob’s constructor.

from text.blob import TextBlob
from text.np_extractors import ConllExtractor

extractor = ConllExtractor()
blob = TextBlob("Extract my noun phrases.", np_extractor=extractor)
blob.noun_phrases  # This will use the Conll2000 noun phrase extractor

POS Taggers

TextBlob currently has two POS tagger implementations, located in text.taggers. The default is the PatternTagger which uses the same implementation as the excellent pattern library.

The second implementation is NLTKTagger which uses NLTK’s TreeBank tagger. It requires numpy and only works on Python 2.

Similar to the noun phrase chunkers, you can explicitly specify which POS tagger to use by passing a tagger instance to the constructor.

from text.blob import TextBlob
from text.taggers import NLTKTagger

nltk_tagger = NLTKTagger()
blob = TextBlob("Tag! You're It!", pos_tagger=nltk_tagger)
blob.pos_tags

Testing

Run

python run_tests.py

to run all tests.

License

TextBlob is licenced under the MIT license. See the bundled LICENSE file for more details.

Changelog

0.3.9 (2013-07-31)

Updated nltk.
ConllExtractor is now Python 3-compatible.
Improved sentiment analysis.
Blobs are equal (with ==) to their string counterparts.
Added instructions to install textblob without nltk bundled.
Dropping official 3.1 and 3.2 support.

0.3.8 (2013-07-30)

Importing TextBlob is now much faster. This is because the noun phrase parsers are trained only on the first call to noun_phrases (instead of training them every time you import TextBlob).
Add text.taggers module which allows user to change which POS tagger implementation to use. Currently supports PatternTagger and NLTKTagger (NLTKTagger only works with Python 2).
NPExtractor and Tagger objects can be passed to TextBlob’s constructor.
Fix bug with POS-tagger not tagging one-letter words.
Rename text/np_extractor.py -> text/np_extractors.py
Add run_tests.py script.

0.3.7 (2013-07-28)

Every word in a Blob or Sentence is a Word instance which has methods for inflection, e.g word.pluralize() and word.singularize().
Updated the np_extractor module. Now has an new implementation, ConllExtractor that uses the Conll2000 chunking corpus. Only works on Py2.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language
Topic
- Text Processing :: Linguistic

Release history Release notifications | RSS feed

0.19.0

Jan 13, 2025

0.18.0.post0

Feb 15, 2024

0.18.0

Feb 15, 2024

0.17.1

Oct 22, 2021

0.17.0

Oct 22, 2021

0.15.3

Feb 24, 2019

0.15.2

Nov 21, 2018

0.15.1

Jan 20, 2018

0.15.0

Dec 2, 2017

0.14.0

Nov 21, 2017

0.13.1

Nov 11, 2017

0.13.0

Aug 16, 2017

0.12.0

Feb 27, 2017

0.11.1

Feb 18, 2016

0.11.0

Nov 1, 2015

0.10.0

Oct 4, 2015

0.9.1

Jun 10, 2015

0.9.0

Sep 15, 2014

0.8.4

Feb 3, 2014

0.8.3

Dec 29, 2013

0.8.2

Dec 21, 2013

0.8.1

Nov 16, 2013

0.8.0

Oct 23, 2013

0.7.1

Sep 30, 2013

0.7.0

Sep 25, 2013

0.6.3

Sep 15, 2013

0.6.2

Sep 5, 2013

0.6.1

Sep 1, 2013

0.6.0

Aug 26, 2013

0.5.3

Aug 21, 2013

0.5.2

Aug 14, 2013

0.5.1

Aug 13, 2013

0.5.0

Aug 10, 2013

0.4.0

Aug 6, 2013

0.3.10

Aug 2, 2013

This version

0.3.9

Jul 31, 2013

0.3.8

Jul 30, 2013

0.3.7

Jul 29, 2013

0.3.6

Jul 28, 2013

0.3.5

Jul 19, 2013

0.3.4

Jul 7, 2013

0.3.3

Jul 7, 2013

0.3.2

Jul 7, 2013

0.3.1

Jul 7, 2013

0.3.0

Jul 7, 2013

0.2.6

Jul 5, 2013

0.2.5

Jul 2, 2013

0.2.4

Jul 1, 2013

0.2.3

Jul 1, 2013

0.2.1

Jul 1, 2013

0.2.0

Jul 1, 2013

0.1.36

Jul 1, 2013

0.1.35

Jul 1, 2013

0.1.34

Jul 1, 2013

0.1.33

Jul 1, 2013

0.1.32

Jul 1, 2013

0.1.31

Jul 1, 2013

0.1.3

Jul 1, 2013

0.1.2

Jul 1, 2013

0.1

Jul 1, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textblob-0.3.9.tar.gz (1.3 MB view details)

Uploaded Jul 31, 2013 Source

Built Distribution

textblob-0.3.9-py2.py3-none-any.whl (1.4 MB view details)

Uploaded Jul 31, 2013 Python 2Python 3

File details

Details for the file textblob-0.3.9.tar.gz.

File metadata

Download URL: textblob-0.3.9.tar.gz
Upload date: Jul 31, 2013
Size: 1.3 MB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for textblob-0.3.9.tar.gz
Algorithm	Hash digest
SHA256	`0015ce089b7266d14fda4ab7d6f7e5a818fd9f75637a8d60f37ea38d77d0d508`
MD5	`94662f537df8ad5ef2a55c5ff0b5810f`
BLAKE2b-256	`2547f4f882e33a08acfa24887576cf2d3f5114a70abf89f6bdf01f8e8b9643cd`

See more details on using hashes here.

File details

Details for the file textblob-0.3.9-py2.py3-none-any.whl.

File metadata

Download URL: textblob-0.3.9-py2.py3-none-any.whl
Upload date: Jul 31, 2013
Size: 1.4 MB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for textblob-0.3.9-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`53b447c599127d6596be5fc836621f67cf4b6ddfe7c57e68255228d55a6b674d`
MD5	`ec1f0d32ef16c5275e52a03b7869ab8a`
BLAKE2b-256	`42710c839c0fde8054184e4fb23617f70725247126c1448824661b981565a194`

See more details on using hashes here.

textblob 0.3.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TextBlob

Requirements

Installation

Option 1

Option 2

Usage

Create a TextBlob

Part-of-speech tags and noun phrases…

Sentiment analysis

Tokenization

Words and inflection

Get word and noun phrase frequencies

TextBlobs are like Python strings!

Get start and end indices of sentences

Get a JSON-serialized version of the blob

Advanced usage

Noun Phrase Chunkers

POS Taggers

Testing

License

Changelog

0.3.9 (2013-07-31)

0.3.8 (2013-07-30)

0.3.7 (2013-07-28)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes