Preprocess German texts for serious NLP.
Project description
German Preprocessing
![PyPI - Python Version](https://pypi-camo.freetls.fastly.net/9616effe1652414cadbdf0b7120a4db672ebd2fd/68747470733a2f2f696d672e736869656c64732e696f2f707970692f707976657273696f6e732f6765726d616e2e737667)
Preprocess German texts to do some serious natural-language processing.
- clean texts
- remove stopwords (as defined by spaCy)
- lemmatize
- lower-case, and remove all punctions, digits are replaced with "0"
Installation
pip install german
Usage
from german import preprocess
preprocess(['Johannes war einer von vielen guten Schülern.', 'Julia trinkt gern Tee.'], remove_stop=True)
# ['johannes gut schüler', 'julia trinken tee']
License
MIT.
Sponsoring
This work was created as part of a project that was funded by the German Federal Ministry of Education and Research.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
german-0.1.0.tar.gz
(2.1 kB
view hashes)
Built Distributions
german-0.1.0-py3-none-any.whl
(3.3 kB
view hashes)
german-0.1.0-py2-none-any.whl
(3.3 kB
view hashes)