Skip to main content

Lemmatizer for Danish

Project description

🤘 Lemmy

Lemmy is a lemmatizer for Danish 🇩🇰 and Swedish 🇸🇪. It comes ready for use. The Danish model is trained on Dansk Sprognævn's (DSN) word list (‘fuldformliste’) and the Danish Universal Dependencies. The Swedish model is trained on the SALDO's morphology dataset and the Swedish Universal Dependencies (Talbanken). Lemmy also supports training on your own dataset.

The models included in Lemmy were evaluated on the respective Universal Dependencies dev datasets. The Danish model scored > 99% accuracy, while the Swedish model scored > 97%.

You can use Lemmy as a spaCy extension, more specifcally a spaCy pipeline component. This is highly recommended and makes the lemmas easily accessible from the spaCy tokens. Lemmy makes use of POS tags to predict the lemmas. When wired up to the spaCy pipeline, Lemmy has the benefit of using spaCy’s builtin POS tagger.

Lemmy can also by used without spaCy, as a standalone lemmatizer. In that case, you will have to provide the POS tags. Alternatively, you can train a Lemmy model which does not depend on POS tags, though most likely the accuracy will suffer.

Lemmy is heavily inspired by the CST Lemmatizer for Danish.

Install

pip install lemmy

Usage

import da_custom_model as da # name of your spaCy model
import lemmy.pipe
nlp = da.load()

# Create an instance of Lemmy's pipeline component for spaCy.
# Replace 'da' with 'sv' for the Swedish lemmatizer.
pipe = lemmy.pipe.load('da')

# Add the comonent to the spaCy pipeline.
nlp.add_pipe(pipe, after='tagger')

# Lemmas can now be accessed using the `._.lemmas` attribute on the tokens.
nlp("akvariernes")[0]._.lemmas

Training

The notebooks folder contains examples showing how to train your own model using Lemmy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lemmy-2.0.0.tar.gz (628.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lemmy-2.0.0-py2.py3-none-any.whl (654.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file lemmy-2.0.0.tar.gz.

File metadata

  • Download URL: lemmy-2.0.0.tar.gz
  • Upload date:
  • Size: 628.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for lemmy-2.0.0.tar.gz
Algorithm Hash digest
SHA256 63f3500434c5323077353cbf4cc79402c9fcf3e41fc5232f917f1ae46c09a433
MD5 ffd70fa8981f994d2a3084f542b57c8c
BLAKE2b-256 aa80a45dac1a79fdd0f833b8725a5dce749bf1388505e946c05ff351775a75cd

See more details on using hashes here.

File details

Details for the file lemmy-2.0.0-py2.py3-none-any.whl.

File metadata

  • Download URL: lemmy-2.0.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 654.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for lemmy-2.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0ec45e8b09bfece6e4677303d46b939f4c9b6b8278975abb6c4ab0fa517923e1
MD5 ce47133f982753a1622cdfddd4c042c4
BLAKE2b-256 94e5c00c86421c79d8fd64ed0270fd3752a34fea5ebad3a845c119b01a327959

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page