Light stemmer for Latvian.
Project description
LatvianStemmer
The original Java code can be found in https://github.com/apache/lucene-solr
Ported to Python by Rihards Krišlauks with minor modifications
Light stemmer for Latvian.
This is a light version of the algorithm in Karlis Kreslin's PhD thesis A stemming algorithm for Latvian with the following modifications:
- Only explicitly stems noun and adjective morphology
- Stricter length/vowel checks for the resulting stems (verb etc suffix stripping is removed)
- Removes only the primary inflectional suffixes: case and number for nouns case, number, gender, and definitiveness for adjectives.
- Palatalization is only handled when a declension II,V,VI noun suffix is removed.
Usage
pip install LatvianStemmer
lvstemmer < input.txt > output.txt
# or
lvstemmer input1.txt input2.txt > output.txt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
LatvianStemmer-1.0.2.tar.gz
(7.0 kB
view hashes)
Built Distribution
Close
Hashes for LatvianStemmer-1.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 246a0ef308102b3e07f23d054084dcdcb2a4e32588f5c60b8da8418d391b8e1a |
|
MD5 | 5fe71012b416964cf513f5a28b2f8393 |
|
BLAKE2b-256 | c958dc0b42c4520f249a8ac3064c887133cadcd394f777b39f2eb92e7edefbe7 |