Skip to main content

Light stemmer for Latvian.

Project description

LatvianStemmer

The original Java code can be found in https://github.com/apache/lucene-solr

Ported to Python by Rihards Krišlauks with minor modifications

Light stemmer for Latvian.

This is a light version of the algorithm in Karlis Kreslin's PhD thesis A stemming algorithm for Latvian with the following modifications:

  • Only explicitly stems noun and adjective morphology
  • Stricter length/vowel checks for the resulting stems (verb etc suffix stripping is removed)
  • Removes only the primary inflectional suffixes: case and number for nouns case, number, gender, and definitiveness for adjectives.
  • Palatalization is only handled when a declension II,V,VI noun suffix is removed.

Usage

pip install LatvianStemmer
lvstemmer < input.txt > output.txt
# or
lvstemmer input1.txt input2.txt > output.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

LatvianStemmer-1.0.2.tar.gz (7.0 kB view hashes)

Uploaded Source

Built Distribution

LatvianStemmer-1.0.2-py3-none-any.whl (7.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page