Skip to main content

A small package for preprocessing german text

Project description

Preprocessing

Install: The project uses pipenv to manage dependencies. You can install all requirements with the following command:

$ pipenv install
$ pipenv shell
$ pipenv run python -m spacy download de

Still ToDo:

  • edit stopword list
  • edit Tag list
  • maybe extend custom lemmatization json file (much work, for less output?)

This Project Uses the Spacy-IWNLP Lemmatizations:

@InProceedings{liebeck-conrad:2015:ACL-IJCNLP,
  author    = {Liebeck, Matthias  and  Conrad, Stefan},
  title     = {{IWNLP: Inverse Wiktionary for Natural Language Processing}},
  booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
  year      = {2015},
  publisher = {Association for Computational Linguistics},
  pages     = {414--418},
  url       = {http://www.aclweb.org/anthology/P15-2068}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_german_preprocess-0.0.2.tar.gz (4.1 MB view hashes)

Uploaded Source

Built Distribution

spacy_german_preprocess-0.0.2-py3-none-any.whl (4.5 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page