Skip to main content

Preprocess German texts for serious NLP.

Project description

German Preprocessing Build Status PyPI PyPI - Python Version

Preprocess German texts to do some serious natural-language processing.

  • clean texts
  • remove stopwords (as defined by spaCy)
  • lemmatize
  • lower-case, and remove all punctions, digits are replaced with "0"

Installation

pip install german

Usage

from german import preprocess

preprocess(['Johannes war einer von vielen guten Schülern.', 'Julia trinkt gern Tee.'], remove_stop=True)
# ['johannes gut schüler', 'julia trinken tee']

License

MIT.

Sponsoring

This work was created as part of a project that was funded by the German Federal Ministry of Education and Research.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

german-0.1.0.tar.gz (2.1 kB view hashes)

Uploaded source

Built Distributions

german-0.1.0-py3-none-any.whl (3.3 kB view hashes)

Uploaded py3

german-0.1.0-py2-none-any.whl (3.3 kB view hashes)

Uploaded py2

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page