Skip to main content

German language support for TextBlob.

Project description

Latest version Travis-CI Number of PyPI downloads

German language support for TextBlob by Steven Loria.

This python package is being developed as a TextBlob Language Extension. See Extension Guidelines for details.

Features

  • TextBlobDE class with initialized default models for German
  • German sentence boundary detection (NLTKPunktTokenizer)
  • Consistent use of specified tokenizer for all tools (NLTKPunktTokenizer or PatternTokenizer)
  • Part-of-speech tagging (PatternTagger)
  • Parsing (PatternParser)
  • Polarity detection (PatternAnalyzer) EXPERIMENTAL (only recognises uninflected word forms and does not have information on subjectivity)
  • Supports Python 2 and 3
  • See working features overview for details

Installing/Upgrading

$ pip install -U textblob-de
$ python -m textblob.download_corpora

Or the latest development release (apparently this does not always work on Windows see issues #1744/5 for details):

$ pip install -U git+https://github.com/markuskiller/textblob-de.git@dev
$ python -m textblob.download_corpora

Note

TextBlob will be installed/upgraded automatically when running pip install. The second line (python -m textblob.download_corpora) downloads/updates nltk corpora and language models used in TextBlob.

Usage

>>> from textblob_de import TextBlobDE as TextBlob
>>> text = '''Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag.
Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen. Aber leider
habe ich nur noch EUR 18.50 in meiner Brieftasche.'''
>>> blob = TextBlob(text)
>>> blob.sentences
[Sentence("Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag."),
 Sentence("Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen."),
 Sentence("Aber leider habe ich nur noch EUR 18.50 in meiner Brieftasche.")]
>>> blob.tokens
WordList(['Heute', 'ist', 'der', '3.', 'Mai', ...]
>>> blob.tags
[('Heute', 'RB'), ('ist', 'VB'), ('der', 'DT'), ('3.', 'LS'), ('Mai', 'NN'),
('2014', 'CD'), ...]
>>> blob = TextBlob("Das Auto ist sehr schön.")
>>> blob.parse()
'Das/DT/B-NP/O Auto/NN/I-NP/O ist/VB/B-VP/O sehr/RB/B-ADJP/O schön/JJ/I-ADJP/O'
>>> blob = TextBlob(text, parser_show_lemmata=True)
'Das/DT/B-NP/O/das Auto/NN/I-NP/O/auto ist/VB/B-VP/O/sein sehr/RB/B-ADJP/O/sehr' \
'schön/JJ/I-ADJP/O/schön ././O/O/.'
>>> blob = TextBlob("Das Auto ist sehr schön.")
>>> blob.sentiment
(1.0, 0.0)
>>> blob = TextBlob("Das Auto ist hässlich.")
>>> blob.sentiment
(-1.0, 0.0)

Warning

WORK IN PROGRESS: The German polarity lexicon contains only uninflected forms and there are no subjectivity scores yet.

Note

Make sure that you use unicode strings on Python2 if your input contains non-ascii characters (e.g. word = u"schön").

Requirements

  • Python >= 2.6 or >= 3.3

TODO

  • Implement German noun phrase extractor
  • Additional PoS tagging options NLTK tagging (NLTKTagger)
  • Improve sentiment analysis (find suitable subjectivity scores and look up lemmas rather than word forms)

License

MIT licensed. See the bundled LICENSE file for more details.

Changelog

0.2.0 (18/07/2014)

  • vastly improved tokenization (NLTKPunktTokenizer and PatternTokenizer with tests)
  • consistent use of specified tokenizer for all tools
  • TextBlobDE with initialized default models for German
  • Parsing (PatternParser) plus test_parsers.py
  • EXPERIMENTAL implementation of Polarity detection (PatternAnalyzer)
  • first attempt at extracting German Polarity clues into de-sentiment.xml
  • tox tests passing for py26, py27, py33 and py34

0.1.3 (09/07/2014)

  • First release on PyPI

0.1.0 - 0.1.2 (09/07/2014)

  • First release on github
  • A number of experimental releases for testing purposes
  • Adapted version badges, tests & travis-ci config
  • Code adapted from sample extension textblob-fr
  • Language specific linguistic resources copied from pattern-de

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for textblob-de, version 0.2.0
Filename, size File type Python version Upload date Hashes
Filename, size textblob_de-0.2.0-py2.py3-none-any.whl (467.9 kB) File type Wheel Python version 3.4 Upload date Hashes View hashes
Filename, size textblob-de-0.2.0.tar.gz (464.8 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page