Skip to main content

German language support for TextBlob.

Project description

Latest version Travis-CI Number of PyPI downloads

German language support for TextBlob by Steven Loria.

This python package is being developed as a TextBlob Language Extension. See Extension Guidelines for details.

Features

  • All directly accessible textblob_de classes (e.g. Sentence() or Word()) are initialized with default models for German

  • Properties or methods that do not yet work for German raise a NotImplementedError

  • German sentence boundary detection and tokenization (NLTKPunktTokenizer)

  • Consistent use of specified tokenizer for all tools (NLTKPunktTokenizer or PatternTokenizer)

  • Part-of-speech tagging (PatternTagger) with keyword include_punc=True (defaults to False)

  • Parsing (PatternParser) with all pattern keywords, plus pprint=True (defaults to False)

  • Noun Phrase Extraction (PatternParserNPExtractor)

  • Lemmatization (PatternParserLemmatizer)

  • Polarity detection (PatternAnalyzer) - Still EXPERIMENTAL, does not yet have information on subjectivity

  • NEW: Full pattern.text.de API support on Python3

  • Supports Python 2 and 3

  • See working features overview for details

Installing/Upgrading

$ pip install -U textblob-de
$ python -m textblob.download_corpora

Or the latest development release (apparently this does not always work on Windows see issues #1744/5 for details):

$ pip install -U git+https://github.com/markuskiller/textblob-de.git@dev
$ python -m textblob.download_corpora

Usage

>>> from textblob_de import TextBlobDE as TextBlob
>>> text = '''Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag.
Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen. Aber leider
habe ich nur noch EUR 18.50 in meiner Brieftasche.'''
>>> blob = TextBlob(text)
>>> blob.sentences
[Sentence("Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag."),
 Sentence("Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen."),
 Sentence("Aber leider habe ich nur noch EUR 18.50 in meiner Brieftasche.")]
>>> blob.tokens
WordList(['Heute', 'ist', 'der', '3.', 'Mai', ...]
>>> blob.tags
[('Heute', 'RB'), ('ist', 'VB'), ('der', 'DT'), ('3.', 'LS'), ('Mai', 'NN'),
('2014', 'CD'), ...]
# Default: Only noun_phrases that consist of two or more meaningful parts are displayed.
# Not perfect, but a start (relies heavily on parser accuracy)
>>> blob.noun_phrases
WordList(['Mai 2014', 'Dr. Meier', 'seinen 43. Geburtstag', 'Kuchen einzukaufen',
'meiner Brieftasche'])
>>> blob = TextBlob("Das Auto ist sehr schön.")
>>> blob.parse()
'Das/DT/B-NP/O Auto/NN/I-NP/O ist/VB/B-VP/O sehr/RB/B-ADJP/O schön/JJ/I-ADJP/O'
>>> from textblob_de import PatternParser
>>> blob = TextBlobDE(u"Das ist ein schönes Auto.", parser=PatternParser(pprint=True, lemmata=True))
>>> blob.parse()
      WORD   TAG    CHUNK   ROLE   ID     PNP    LEMMA

       Das   DT     -       -      -      -      das
       ist   VB     VP      -      -      -      sein
       ein   DT     NP      -      -      -      ein
   schönes   JJ     NP ^    -      -      -      schön
      Auto   NN     NP ^    -      -      -      auto
         .   .      -       -      -      -      .
>>> from textblob_de import PatternTagger
>>> blob = TextBlob(text, pos_tagger=PatternTagger(include_punc=True))
[('Das', 'DT'), ('Auto', 'NN'), ('ist', 'VB'), ('sehr', 'RB'), ('schön', 'JJ'), ('.', '.')]
>>> blob = TextBlob("Das Auto ist sehr schön.")
>>> blob.sentiment
(1.0, 0.0)
>>> blob = TextBlob("Das ist ein hässliches Auto.")
>>> blob.sentiment
(-1.0, 0.0)
>>> blob.words.lemmatize()
WordList(['das', 'sein', 'ein', 'hässlich', 'Auto'])
>>> from textblob_de.lemmatizers import PatternParserLemmatizer
>>> _lemmatizer = PatternParserLemmatizer()
>>> _lemmatizer.lemmatize("Das ist ein hässliches Auto.")
[('das', 'DT'), ('sein', 'VB'), ('ein', 'DT'), ('hässlich', 'JJ'), ('Auto', 'NN')]

Access to pattern API in Python3

>>> from textblob_de.packages import pattern_de as pd
>>> print(pd.attributive("neugierig", gender=pd.FEMALE, role=pd.INDIRECT, article="die"))
neugierigen

Requirements

  • Python >= 2.6 or >= 3.3

TODO

  • TextBlob Extension: textblob-rftagger (wrapper class for RFTagger)

  • TextBlob Extension: textblob-cmd (command-line wrapper for TextBlob, basically TextBlob for files

  • TextBlob Extension: textblob-stanfordparser (wrapper class for StanfordParser via NLTK)

  • TextBlob Extension: textblob-berkeleyparser (wrapper class for BerkeleyParser)

  • TextBlob Extension: textblob-sent-align (sentence alignment for parallel TextBlobs)

  • TextBlob Extension: textblob-converters (various input and output conversions)

  • Additional PoS tagging options, e.g. NLTK tagging (NLTKTagger)

  • Improve noun phrase extraction (e.g. based on RFTagger output)

  • Improve sentiment analysis (find suitable subjectivity scores)

  • Improve functionality of Sentence() and Word() objects

  • Adapt more tests from textblob main package (esp. for TextBlobDE() in test_blob.py)

License

MIT licensed. See the bundled LICENSE file for more details.

Changelog

0.2.6 (04/08/2014)

  • Fixed MANIFEST.in for package data in sdist

0.2.5 (04/08/2014)

  • sdist is non-functional as important files are missing due to a misconfiguration in MANIFEST.in - does not affect wheels

  • Major internal refactoring (but no backwards-incompatible API changes) with the aim of restoring complete compatibility to original pattern>=2.6 library on Python2

  • Separation of textblob and pattern code

  • On Python2 the vendorized version of pattern.text.de is only used, if original is not installed (same as nltk)

  • Made pattern.de.pprint function and all parser keywords accessible to customise parser output

  • Access to complete pattern.text.de API on Python2 and Python3 from textblob_de.packages import pattern_de as pd

  • tox passed on all major platforms (Win/Linux/OSX)

0.2.3 (26/07/2014)

  • Lemmatizer: PatternParserLemmatizer() extracts lemmata from Parser output

  • Improved polarity analysis through look-up of lemmatised word forms

0.2.2 (22/07/2014)

  • Option: Include punctuation in tags/pos_tags properties (b = TextBlobDE(text, tagger=PatternTagger(include_punc=True)))

  • Added BlobberDE() class initialized with German models

  • TextBlobDE(), Sentence(), WordList() and Word() classes are now all initialized with German models

  • Restored complete API compatibility with textblob.tokenizers module of textblob main package

0.2.1 (20/07/2014)

  • Noun Phrase Extraction: PatternParserNPExtractor() extracts NPs from Parser output

  • Refactored the way TextBlobDE() passes on arguments and keyword arguments to individual tools

  • Backwards-incompatible: Deprecate parser_show_lemmata=True keyword in TextBlob(). Use parser=PatternParser(lemmata=True) instead.

0.2.0 (18/07/2014)

  • vastly improved tokenization (NLTKPunktTokenizer and PatternTokenizer with tests)

  • consistent use of specified tokenizer for all tools

  • TextBlobDE with initialized default models for German

  • Parsing (PatternParser) plus test_parsers.py

  • EXPERIMENTAL implementation of Polarity detection (PatternAnalyzer)

  • first attempt at extracting German Polarity clues into de-sentiment.xml

  • tox tests passing for py26, py27, py33 and py34

0.1.3 (09/07/2014)

  • First release on PyPI

0.1.0 - 0.1.2 (09/07/2014)

  • First release on github

  • A number of experimental releases for testing purposes

  • Adapted version badges, tests & travis-ci config

  • Code adapted from sample extension textblob-fr

  • Language specific linguistic resources copied from pattern-de

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textblob-de-0.2.6.tar.gz (457.1 kB view details)

Uploaded Source

Built Distribution

textblob_de-0.2.6-py2.py3-none-any.whl (1.0 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file textblob-de-0.2.6.tar.gz.

File metadata

  • Download URL: textblob-de-0.2.6.tar.gz
  • Upload date:
  • Size: 457.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for textblob-de-0.2.6.tar.gz
Algorithm Hash digest
SHA256 0e7d8214ad91d0b08181417715f636d82ac8fb920f995e7d3ce06943ed54b7cc
MD5 c4e41b933ec5e54ad454754aa0d75bfe
BLAKE2b-256 c257f94921b7e1d33a228fb821dd9f69e3e0b145920933ff3e8329ffc31aa18a

See more details on using hashes here.

File details

Details for the file textblob_de-0.2.6-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for textblob_de-0.2.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a732bf4d37f4be08d5b5257f2dc306acddf655e056e8abad6d08362e7807e6af
MD5 457fbd5054118c6be8fb27422b1fb87f
BLAKE2b-256 52f8dbdcfdad08b7a9c9381e605cbf1e94d9c914861890e13e816ae593ec5212

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page