Skip to main content

A Python package (using a Docker image under the hood) to lemmatize German texts.

Project description

German Lemmatizer

A Python package (using a Docker image under the hood) to lemmatize German texts.

Built upon:

It works as follows. First spaCy tags the token with POS. Then German Lemmatizer looks up lemmas on IWNLP and GermanLemma. If they disagree, choose the one from IWNLP. If they agree or only one tool finds it, take it. Try to preserve the casing of the original token.

You may want to use underlying Docker image: german-lemmatizer-docker


  1. Install Docker.
  2. pip install german-lemmatizer


  1. Read and accept the license terms of the TIGER Corpus (free to use for non-commercial purposes).
  2. Make sure the Docker daemons runs.
  3. Write some Python code
from german_lemmatizer import lemmatize

    ['Johannes war ein guter Schüler', 'Sabiene sang zahlreiche Lieder'],

The list of texts is split into chunks (chunk_size) and processed in parallel (n_jobs).

Enable the escape parameter if your text contains newslines. remove_stop removes stopwords as defined by spaCy.



Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for german-lemmatizer, version 0.1.1
Filename, size File type Python version Upload date Hashes
Filename, size german_lemmatizer-0.1.1-py3-none-any.whl (4.5 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size german_lemmatizer-0.1.1.tar.gz (3.2 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page