A modified Porter stemmer for verbs and other additional rules.
Project description
What is stemming?
Stemming is a technique in Natural Language Processing that reduces various inflected forms of a word to a single invariant root form. This root form, known as the stem, may or may not be identical to the word's morphological root.
What is it good for?
Stemming is highly useful in various applications, with query expansion in information retrieval being a prime example. For instance, in a search engine, if a user searches for "cat," it would be beneficial for the search to return documents containing the word "cats" as well. This won't happen unless both the query and the document index undergo stemming. Essentially, stemming reduces the specificity of queries, enabling the retrieval of more relevant results, though this involves a trade-off.
What type of stemmer is this?
modifiedstemmer is a suffix-stripping stemmer, which means it transforms words into stems by applying a predetermined sequence of changes to the word's suffix. Other stemmers may function differently, such as by using a lookup table to map inflected forms to their roots or by employing clustering techniques to group various forms around a central form. Each approach comes with its own set of pros and cons. modifiedstemmer, specifically, is a modified version of the original Porter stemmer and includes more comprehensive rules for handling verbs and suffixes.
How do I use it?
Using the modifiedstemmer is straightforward. Simply import the stemmer, create an instance, and use it to stem words:
from mod_stemmer import modifiedstemmer
my_stemmer = modifiedstemmer.stemmer()
print(my_stemmer.stem('consistency'))
This process will convert the word 'consistent' to its stem form.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file modifiedstemmer-0.0.11.tar.gz
.
File metadata
- Download URL: modifiedstemmer-0.0.11.tar.gz
- Upload date:
- Size: 11.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f4157d8e6d19610fed28733ab2e3c366f13c4053eca16f7bf3ceb878811ce90 |
|
MD5 | 1067971f0ab6e546496bd4d25bae0b92 |
|
BLAKE2b-256 | a340f4782ef6640f0c7cb7f2aca5cbcb079baf59381737fde662458dd47e69e3 |
File details
Details for the file modifiedstemmer-0.0.11-py3-none-any.whl
.
File metadata
- Download URL: modifiedstemmer-0.0.11-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 491e95890538e82ba894aed460f5f61f9402380d3c94598d0a931776959bd5bd |
|
MD5 | bd1cbecf640028117bd0a6e735046348 |
|
BLAKE2b-256 | 57ad2a1abfd87484a02a67ea3bb6bd94d8c49cb2ebc341b45b8bd852551b41da |