Skip to main content

German language support for TextBlob.

Project description

Latest version Travis-CI Number of PyPI downloads

German language support for TextBlob by Steven Loria.

This python package is being developed as a TextBlob Language Extension. See Extension Guidelines for details.

Features

  • All directly accessible textblob_de classes (e.g. Sentence() or Word()) are now initialized with default models for German

  • Properties or methods that do not yet work for German now raise a NotImplementedError

  • German sentence boundary detection and tokenization (NLTKPunktTokenizer)

  • Consistent use of specified tokenizer for all tools (NLTKPunktTokenizer or PatternTokenizer)

  • Part-of-speech tagging (PatternTagger) with keyword include_punc=True (defaults to False)

  • Parsing (PatternParser) with keyword lemmata=True (defaults to False)

  • Noun Phrase Extraction (PatternParserNPExtractor)

  • Polarity detection (PatternAnalyzer) EXPERIMENTAL (only recognises uninflected word forms and does not have information on subjectivity)

  • Supports Python 2 and 3

  • See working features overview for details

Installing/Upgrading

$ pip install -U textblob-de
$ python -m textblob.download_corpora

Or the latest development release (apparently this does not always work on Windows see issues #1744/5 for details):

$ pip install -U git+https://github.com/markuskiller/textblob-de.git@dev
$ python -m textblob.download_corpora

Usage

>>> from textblob_de import TextBlobDE as TextBlob
>>> text = '''Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag.
Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen. Aber leider
habe ich nur noch EUR 18.50 in meiner Brieftasche.'''
>>> blob = TextBlob(text)
>>> blob.sentences
[Sentence("Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag."),
 Sentence("Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen."),
 Sentence("Aber leider habe ich nur noch EUR 18.50 in meiner Brieftasche.")]
>>> blob.tokens
WordList(['Heute', 'ist', 'der', '3.', 'Mai', ...]
>>> blob.tags
[('Heute', 'RB'), ('ist', 'VB'), ('der', 'DT'), ('3.', 'LS'), ('Mai', 'NN'),
('2014', 'CD'), ...]
# not perfect, but a start (relies heavily on parser accuracy)
>>> blob.noun_phrases
WordList(['Mai 2014', 'Dr. Meier', 'seinen 43. Geburtstag', 'Kuchen einzukaufen',
'meiner Brieftasche'])
>>> blob = TextBlob("Das Auto ist sehr schön.")
>>> blob.parse()
'Das/DT/B-NP/O Auto/NN/I-NP/O ist/VB/B-VP/O sehr/RB/B-ADJP/O schön/JJ/I-ADJP/O'
>>> from textblob_de import PatternParser
>>> blob = TextBlob(text, parser=PatternParser(lemmata=True))
'Das/DT/B-NP/O/das Auto/NN/I-NP/O/auto ist/VB/B-VP/O/sein sehr/RB/B-ADJP/O/sehr' \
'schön/JJ/I-ADJP/O/schön ././O/O/.'
>>> from textblob_de import PatternTagger
>>> blob = TextBlob(text, pos_tagger=PatternTagger(include_punc=True))
[('Das', 'DT'), ('Auto', 'NN'), ('ist', 'VB'), ('sehr', 'RB'), ('schön', 'JJ'), ('.', '.')]
>>> blob = TextBlob("Das Auto ist sehr schön.")
>>> blob.sentiment
(1.0, 0.0)
>>> blob = TextBlob("Das Auto ist hässlich.")
>>> blob.sentiment
(-1.0, 0.0)

Requirements

  • Python >= 2.6 or >= 3.3

TODO

  • Additional PoS tagging options NLTK tagging (NLTKTagger)

  • Improve sentiment analysis (find suitable subjectivity scores and look up lemmas rather than word forms)

  • Improve functionality of Sentence() and Word() objects

  • Adapt more tests from textblob main package (esp. for TextBlobDE() in test_blob.py)

License

MIT licensed. See the bundled LICENSE file for more details.

Changelog

0.2.2 (22/07/2014)

  • Option: Include punctuation in tags/pos_tags properties (b = TextBlobDE(text, tagger=PatternTagger(include_punc=True)))

  • Added BlobberDE() class initialized with German models

  • TextBlobDE(), Sentence(), WordList() and Word() classes are now all initialized with German models

  • Restored complete API compatibility with textblob.tokenizers module of textblob main package

0.2.1 (20/07/2014)

  • Noun Phrase Extraction: PatternParserNPExtractor() extracts NPs from Parser output

  • Refactored the way TextBlobDE() passes on arguments and keyword arguments to individual tools

  • Backwards-incompatible: Deprecate parser_show_lemmata=True keyword in TextBlob(). Use parser=PatternParser(lemmata=True) instead.

0.2.0 (18/07/2014)

  • vastly improved tokenization (NLTKPunktTokenizer and PatternTokenizer with tests)

  • consistent use of specified tokenizer for all tools

  • TextBlobDE with initialized default models for German

  • Parsing (PatternParser) plus test_parsers.py

  • EXPERIMENTAL implementation of Polarity detection (PatternAnalyzer)

  • first attempt at extracting German Polarity clues into de-sentiment.xml

  • tox tests passing for py26, py27, py33 and py34

0.1.3 (09/07/2014)

  • First release on PyPI

0.1.0 - 0.1.2 (09/07/2014)

  • First release on github

  • A number of experimental releases for testing purposes

  • Adapted version badges, tests & travis-ci config

  • Code adapted from sample extension textblob-fr

  • Language specific linguistic resources copied from pattern-de

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textblob-de-0.2.2.tar.gz (476.1 kB view details)

Uploaded Source

Built Distribution

textblob_de-0.2.2-py2.py3-none-any.whl (480.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file textblob-de-0.2.2.tar.gz.

File metadata

  • Download URL: textblob-de-0.2.2.tar.gz
  • Upload date:
  • Size: 476.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for textblob-de-0.2.2.tar.gz
Algorithm Hash digest
SHA256 76c2ff29ebbe1337a3fd42a8e73d93ed32eeaa8d8ebb9d53d60c9ee2c42d8f01
MD5 c5b0be4e0e2885e600fdb7884776a082
BLAKE2b-256 471c54d53466bfe6e11a4948d6f77d6e7b22bcf46f6f6cc9dea9e4edfe62b91c

See more details on using hashes here.

File details

Details for the file textblob_de-0.2.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for textblob_de-0.2.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 911da455ffbbd4f928571e319c6c43a8665452a368cab704f8d1f84fa4634850
MD5 197546cce7764756e9de87cd5a8ad563
BLAKE2b-256 7b1801237bd0ffc4d0393fed5756a5f2c2be470cff6863fd59f77dd609c25cd7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page