Skip to main content

Preprocess German texts for serious NLP.

Project description

German Preprocessing Build Status PyPI PyPI - Python Version

Preprocess German texts to do some serious natural-language processing.

  • clean texts
  • remove stopwords (as defined by spaCy)
  • lemmatize
  • lower-case, and remove all punctions, digits are replaced with "0"

Installation

pip install german

Usage

from german import preprocess

preprocess(['Johannes war einer von vielen guten Schülern.', 'Julia trinkt gern Tee.'], remove_stop=True)
# ['johannes gut schüler', 'julia trinken tee']

License

MIT.

Sponsoring

This work was created as part of a project that was funded by the German Federal Ministry of Education and Research.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

german-0.1.0.tar.gz (2.1 kB view details)

Uploaded Source

Built Distributions

german-0.1.0-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

german-0.1.0-py2-none-any.whl (3.3 kB view details)

Uploaded Python 2

File details

Details for the file german-0.1.0.tar.gz.

File metadata

  • Download URL: german-0.1.0.tar.gz
  • Upload date:
  • Size: 2.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for german-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a078711a05b8207e22e3f8c1f58eefa0c155a07df1275a5b1c3e38029efaf14c
MD5 5fc34c140288de65ffd422e27956cdaa
BLAKE2b-256 3c01a7837bdbb47b59101d5468d9b1a6c6143a378df03b03144e963041c669fd

See more details on using hashes here.

File details

Details for the file german-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: german-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for german-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 547635261f1bf1a338052034f177c5d5f6dfffe9aad10ef152d60a4de8ff58ba
MD5 9a6ec5145e5f971a6ad9c0a2fedd6269
BLAKE2b-256 bedda5a6e235538d803fbe39468a897196b38dcaa8dc6aba8902e2f658ac2ddf

See more details on using hashes here.

File details

Details for the file german-0.1.0-py2-none-any.whl.

File metadata

  • Download URL: german-0.1.0-py2-none-any.whl
  • Upload date:
  • Size: 3.3 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for german-0.1.0-py2-none-any.whl
Algorithm Hash digest
SHA256 028fb365a9e8b1c5f57f957f0fe1c39dc7df386081572f9fabe2a34c9929c823
MD5 f8f06cfffd112d121eb8c3296d1fd53f
BLAKE2b-256 904dafc3a979b4a395fc647ec98bda92a7f1843b35a3be9f94940c0b14aae37b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page