Preprocess German texts for serious NLP.
Project description
German Preprocessing

Preprocess German texts to do some serious natural-language processing.
- clean texts
- remove stopwords (as defined by spaCy)
- lemmatize
- lower-case, and remove all punctions, digits are replaced with "0"
Installation
pip install german
Usage
from german import preprocess
preprocess(['Johannes war einer von vielen guten Schülern.', 'Julia trinkt gern Tee.'], remove_stop=True)
# ['johannes gut schüler', 'julia trinken tee']
License
MIT.
Sponsoring
This work was created as part of a project that was funded by the German Federal Ministry of Education and Research.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file german-0.1.0.tar.gz.
File metadata
- Download URL: german-0.1.0.tar.gz
- Upload date:
- Size: 2.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a078711a05b8207e22e3f8c1f58eefa0c155a07df1275a5b1c3e38029efaf14c
|
|
| MD5 |
5fc34c140288de65ffd422e27956cdaa
|
|
| BLAKE2b-256 |
3c01a7837bdbb47b59101d5468d9b1a6c6143a378df03b03144e963041c669fd
|
File details
Details for the file german-0.1.0-py3-none-any.whl.
File metadata
- Download URL: german-0.1.0-py3-none-any.whl
- Upload date:
- Size: 3.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
547635261f1bf1a338052034f177c5d5f6dfffe9aad10ef152d60a4de8ff58ba
|
|
| MD5 |
9a6ec5145e5f971a6ad9c0a2fedd6269
|
|
| BLAKE2b-256 |
bedda5a6e235538d803fbe39468a897196b38dcaa8dc6aba8902e2f658ac2ddf
|
File details
Details for the file german-0.1.0-py2-none-any.whl.
File metadata
- Download URL: german-0.1.0-py2-none-any.whl
- Upload date:
- Size: 3.3 kB
- Tags: Python 2
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
028fb365a9e8b1c5f57f957f0fe1c39dc7df386081572f9fabe2a34c9929c823
|
|
| MD5 |
f8f06cfffd112d121eb8c3296d1fd53f
|
|
| BLAKE2b-256 |
904dafc3a979b4a395fc647ec98bda92a7f1843b35a3be9f94940c0b14aae37b
|