A small package for preprocessing german text
Project description
Preprocessing
Install: The project uses pipenv to manage dependencies. You can install all requirements with the following command:
$ pipenv install
$ pipenv shell
$ pipenv run python -m spacy download de
Still ToDo:
- edit stopword list
- edit Tag list
- maybe extend custom lemmatization json file (much work, for less output?)
This Project Uses the Spacy-IWNLP Lemmatizations:
@InProceedings{liebeck-conrad:2015:ACL-IJCNLP,
author = {Liebeck, Matthias and Conrad, Stefan},
title = {{IWNLP: Inverse Wiktionary for Natural Language Processing}},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
year = {2015},
publisher = {Association for Computational Linguistics},
pages = {414--418},
url = {http://www.aclweb.org/anthology/P15-2068}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for spacy_german_preprocess-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32daa9fe58466a1437954089f4ca2c540aaa075a0ef2d6aa21408770828232e1 |
|
MD5 | 350f835c5a76ea41d3aede6a0bfc66eb |
|
BLAKE2b-256 | c1f27b7edb9c429e6bb5737ac12b2ecc2f29703973772f871232410e550dc899 |
Close
Hashes for spacy_german_preprocess-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d89b13638e18d138c66ba0623fda5a8ab040391ce5eabbc7e23347e669b05a13 |
|
MD5 | a79e62844ea86e14d75f424c67f43ebb |
|
BLAKE2b-256 | ea78a0e4334a576f2cf8da816d60bc3977838ce672ea65ab4b90a4248efe6c97 |