A Python package (using a Docker image under the hood) to lemmatize German texts.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
- Python :: 3.7
Topic
- Utilities

Project description

German Lemmatizer

A Python package (using a Docker image under the hood) to lemmatize German texts.

Built upon:

IWNLP uses the crowd-generated token tables on de.wikitionary.
GermaLemma: Looks up lemmas in the TIGER Corpus and uses Pattern as a fallback for some rule-based lemmatizations.

It works as follows. First spaCy tags the token with POS. Then German Lemmatizer looks up lemmas on IWNLP and GermanLemma. If they disagree, choose the one from IWNLP. If they agree or only one tool finds it, take it. Try to preserve the casing of the original token.

You may want to use underlying Docker image: german-lemmatizer-docker

Installation

Install Docker.
pip install german-lemmatizer

Usage

Read and accept the license terms of the TIGER Corpus (free to use for non-commercial purposes).
Make sure the Docker daemons runs.
Write some Python code

from german_lemmatizer import lemmatize

lemmatize(
    ['Johannes war ein guter Schüler', 'Sabiene sang zahlreiche Lieder'],
    working_dir='*',
    chunk_size=10000,
    n_jobs=1,
    escape=False,
    remove_stop=False)

The list of texts is split into chunks (chunk_size) and processed in parallel (n_jobs).

Enable the escape parameter if your text contains newslines. remove_stop removes stopwords as defined by spaCy.

License

MIT.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
- Python :: 3.7
Topic
- Utilities

Release history Release notifications | RSS feed

This version

0.1.1

Jul 30, 2019

0.1.0

Jul 30, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

german_lemmatizer-0.1.1.tar.gz (3.2 kB view details)

Uploaded Jul 30, 2019 Source

Built Distribution

german_lemmatizer-0.1.1-py3-none-any.whl (4.5 kB view details)

Uploaded Jul 30, 2019 Python 3

File details

Details for the file german_lemmatizer-0.1.1.tar.gz.

File metadata

Download URL: german_lemmatizer-0.1.1.tar.gz
Upload date: Jul 30, 2019
Size: 3.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for german_lemmatizer-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`a4b638853e0f4549fb2866d89c9208e50692c2b3ddf6e2c6e75456e67aaf0790`
MD5	`a975285f1f73b0352e9a3f211e636830`
BLAKE2b-256	`6464c7c2913cff0eb14d08440cdb9ff7f63292dfb9032f65f4275306e4912f5c`

See more details on using hashes here.

File details

Details for the file german_lemmatizer-0.1.1-py3-none-any.whl.

File metadata

Download URL: german_lemmatizer-0.1.1-py3-none-any.whl
Upload date: Jul 30, 2019
Size: 4.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for german_lemmatizer-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b7275adaab3259f6e907629e148c2f66ad6a9af3cbfb871b01f611f2a3c85092`
MD5	`77dae0fb78551f7091fab11ed629e847`
BLAKE2b-256	`2206c1958afb1a0d9979423eb67b2acecff6b95762b346fd9c872cfffa2d867a`

See more details on using hashes here.

german-lemmatizer 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

German Lemmatizer

Installation

Usage

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes