Preprocess German texts for serious NLP.
Project description
German Preprocessing
Preprocess German texts to do some serious natural-language processing.
- clean texts
- remove stopwords (as defined by spaCy)
- lemmatize
- lower-case, and remove all punctions, digits are replaced with "0"
Installation
pip install german
Usage
from german import preprocess
preprocess(['Johannes war einer von vielen guten Schülern.', 'Julia trinkt gern Tee.'], remove_stop=True)
# ['johannes gut schüler', 'julia trinken tee']
License
MIT.
Sponsoring
This work was created as part of a project that was funded by the German Federal Ministry of Education and Research.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
german-0.1.0.tar.gz
(2.1 kB
view details)
Built Distributions
File details
Details for the file german-0.1.0.tar.gz
.
File metadata
- Download URL: german-0.1.0.tar.gz
- Upload date:
- Size: 2.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a078711a05b8207e22e3f8c1f58eefa0c155a07df1275a5b1c3e38029efaf14c |
|
MD5 | 5fc34c140288de65ffd422e27956cdaa |
|
BLAKE2b-256 | 3c01a7837bdbb47b59101d5468d9b1a6c6143a378df03b03144e963041c669fd |
File details
Details for the file german-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: german-0.1.0-py3-none-any.whl
- Upload date:
- Size: 3.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 547635261f1bf1a338052034f177c5d5f6dfffe9aad10ef152d60a4de8ff58ba |
|
MD5 | 9a6ec5145e5f971a6ad9c0a2fedd6269 |
|
BLAKE2b-256 | bedda5a6e235538d803fbe39468a897196b38dcaa8dc6aba8902e2f658ac2ddf |
File details
Details for the file german-0.1.0-py2-none-any.whl
.
File metadata
- Download URL: german-0.1.0-py2-none-any.whl
- Upload date:
- Size: 3.3 kB
- Tags: Python 2
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 028fb365a9e8b1c5f57f957f0fe1c39dc7df386081572f9fabe2a34c9929c823 |
|
MD5 | f8f06cfffd112d121eb8c3296d1fd53f |
|
BLAKE2b-256 | 904dafc3a979b4a395fc647ec98bda92a7f1843b35a3be9f94940c0b14aae37b |