Skip to main content

A modified Porter stemmer for verbs and other additional rules.

Project description

What is stemming?

Stemming is a technique in Natural Language Processing that reduces various inflected forms of a word to a single invariant root form. This root form, known as the stem, may or may not be identical to the word's morphological root.

What is it good for?

Stemming is highly useful in various applications, with query expansion in information retrieval being a prime example. For instance, in a search engine, if a user searches for "cat," it would be beneficial for the search to return documents containing the word "cats" as well. This won't happen unless both the query and the document index undergo stemming. Essentially, stemming reduces the specificity of queries, enabling the retrieval of more relevant results, though this involves a trade-off.

What type of stemmer is this?

modifiedstemmer is a suffix-stripping stemmer, which means it transforms words into stems by applying a predetermined sequence of changes to the word's suffix. Other stemmers may function differently, such as by using a lookup table to map inflected forms to their roots or by employing clustering techniques to group various forms around a central form. Each approach comes with its own set of pros and cons. modifiedstemmer, specifically, is a modified version of the original Porter stemmer and includes more comprehensive rules for handling verbs and suffixes.

How do I use it?

Using the modifiedstemmer is straightforward. Simply import the stemmer, create an instance, and use it to stem words:

from mod_stemmer import modifiedstemmer
my_stemmer = modifiedstemmer.stemmer()
print(my_stemmer.stem('consistency'))

This process will convert the word 'consistent' to its stem form.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modifiedstemmer-0.0.11.tar.gz (11.1 kB view hashes)

Uploaded Source

Built Distribution

modifiedstemmer-0.0.11-py3-none-any.whl (6.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page