Skip to main content

German language support for TextBlob.

Project description

Latest version Travis-CI Number of PyPI downloads

German language support for TextBlob by Steven Loria.

This python package is being developed as a TextBlob Language Extension. See Extension Guidelines for details.

Features

  • TextBlobDE class with initialized default models for German

  • German sentence boundary detection (NLTKPunktTokenizer)

  • Consistent use of specified tokenizer for all tools (NLTKPunktTokenizer or PatternTokenizer)

  • Part-of-speech tagging (PatternTagger)

  • Parsing (PatternParser)

  • Polarity detection (PatternAnalyzer) EXPERIMENTAL (only recognises uninflected word forms and does not have information on subjectivity)

  • Supports Python 2 and 3

  • See working features overview for details

Installing/Upgrading

$ pip install -U textblob-de
$ python -m textblob.download_corpora

Or the latest development release (apparently this does not always work on Windows see issues #1744/5 for details):

$ pip install -U git+https://github.com/markuskiller/textblob-de.git@dev
$ python -m textblob.download_corpora

Usage

>>> from textblob_de import TextBlobDE as TextBlob
>>> text = '''Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag.
Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen. Aber leider
habe ich nur noch EUR 18.50 in meiner Brieftasche.'''
>>> blob = TextBlob(text)
>>> blob.sentences
[Sentence("Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag."),
 Sentence("Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen."),
 Sentence("Aber leider habe ich nur noch EUR 18.50 in meiner Brieftasche.")]
>>> blob.tokens
WordList(['Heute', 'ist', 'der', '3.', 'Mai', ...]
>>> blob.tags
[('Heute', 'RB'), ('ist', 'VB'), ('der', 'DT'), ('3.', 'LS'), ('Mai', 'NN'),
('2014', 'CD'), ...]
>>> blob = TextBlob("Das Auto ist sehr schön.")
>>> blob.parse()
'Das/DT/B-NP/O Auto/NN/I-NP/O ist/VB/B-VP/O sehr/RB/B-ADJP/O schön/JJ/I-ADJP/O'
>>> blob = TextBlob(text, parser_show_lemmata=True)
'Das/DT/B-NP/O/das Auto/NN/I-NP/O/auto ist/VB/B-VP/O/sein sehr/RB/B-ADJP/O/sehr' \
'schön/JJ/I-ADJP/O/schön ././O/O/.'
>>> blob = TextBlob("Das Auto ist sehr schön.")
>>> blob.sentiment
(1.0, 0.0)
>>> blob = TextBlob("Das Auto ist hässlich.")
>>> blob.sentiment
(-1.0, 0.0)

Requirements

  • Python >= 2.6 or >= 3.3

TODO

  • Implement German noun phrase extractor

  • Additional PoS tagging options NLTK tagging (NLTKTagger)

  • Improve sentiment analysis (find suitable subjectivity scores and look up lemmas rather than word forms)

License

MIT licensed. See the bundled LICENSE file for more details.

Changelog

0.2.0 (18/07/2014)

  • vastly improved tokenization (NLTKPunktTokenizer and PatternTokenizer with tests)

  • consistent use of specified tokenizer for all tools

  • TextBlobDE with initialized default models for German

  • Parsing (PatternParser) plus test_parsers.py

  • EXPERIMENTAL implementation of Polarity detection (PatternAnalyzer)

  • first attempt at extracting German Polarity clues into de-sentiment.xml

  • tox tests passing for py26, py27, py33 and py34

0.1.3 (09/07/2014)

  • First release on PyPI

0.1.0 - 0.1.2 (09/07/2014)

  • First release on github

  • A number of experimental releases for testing purposes

  • Adapted version badges, tests & travis-ci config

  • Code adapted from sample extension textblob-fr

  • Language specific linguistic resources copied from pattern-de

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textblob-de-0.2.0.tar.gz (464.8 kB view details)

Uploaded Source

Built Distribution

textblob_de-0.2.0-py2.py3-none-any.whl (467.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file textblob-de-0.2.0.tar.gz.

File metadata

  • Download URL: textblob-de-0.2.0.tar.gz
  • Upload date:
  • Size: 464.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for textblob-de-0.2.0.tar.gz
Algorithm Hash digest
SHA256 654aed810009f68806a13730285897adca6b7363d3a0752515febd745a87fb2b
MD5 d8ba5fc1f8b9d7c25afd80ab62470dc6
BLAKE2b-256 9d16f930de7734fcbe77f91140e666b3cd7712e74725323004d0343fb081000f

See more details on using hashes here.

File details

Details for the file textblob_de-0.2.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for textblob_de-0.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 7aecb5d1f28fcccb33f5a76b7107c9543252aa7103000233e76731c06dc54f9c
MD5 76699d08853d9a58c7e14cd3e6f4fb31
BLAKE2b-256 3b28806160aa14b1663068c424a5874d5e8c44c0da82ee329ab38ed5d78b2426

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page