Skip to main content

A modified Porter stemmer for verbs and other additional rules.

Project description

What is stemming?

Stemming is a technique in Natural Language Processing that reduces various inflected forms of a word to a single invariant root form. This root form, known as the stem, may or may not be identical to the word's morphological root.

What is it good for?

Stemming is highly useful in various applications, with query expansion in information retrieval being a prime example. For instance, in a search engine, if a user searches for "cat," it would be beneficial for the search to return documents containing the word "cats" as well. This won't happen unless both the query and the document index undergo stemming. Essentially, stemming reduces the specificity of queries, enabling the retrieval of more relevant results, though this involves a trade-off.

What type of stemmer is this?

modifiedstemmer is a suffix-stripping stemmer, which means it transforms words into stems by applying a predetermined sequence of changes to the word's suffix. Other stemmers may function differently, such as by using a lookup table to map inflected forms to their roots or by employing clustering techniques to group various forms around a central form. Each approach comes with its own set of pros and cons. modifiedstemmer, specifically, is a modified version of the original Porter stemmer and includes more comprehensive rules for handling verbs and suffixes.

How do I use it?

Using the modifiedstemmer is straightforward. Simply import the stemmer, create an instance, and use it to stem words:

from mod_stemmer import modifiedstemmer
my_stemmer = modifiedstemmer.stemmer()
print(my_stemmer.stem('consistency'))

This process will convert the word 'consistent' to its stem form.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modifiedstemmer-0.0.11.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

modifiedstemmer-0.0.11-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file modifiedstemmer-0.0.11.tar.gz.

File metadata

  • Download URL: modifiedstemmer-0.0.11.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.1

File hashes

Hashes for modifiedstemmer-0.0.11.tar.gz
Algorithm Hash digest
SHA256 7f4157d8e6d19610fed28733ab2e3c366f13c4053eca16f7bf3ceb878811ce90
MD5 1067971f0ab6e546496bd4d25bae0b92
BLAKE2b-256 a340f4782ef6640f0c7cb7f2aca5cbcb079baf59381737fde662458dd47e69e3

See more details on using hashes here.

Provenance

File details

Details for the file modifiedstemmer-0.0.11-py3-none-any.whl.

File metadata

File hashes

Hashes for modifiedstemmer-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 491e95890538e82ba894aed460f5f61f9402380d3c94598d0a931776959bd5bd
MD5 bd1cbecf640028117bd0a6e735046348
BLAKE2b-256 57ad2a1abfd87484a02a67ea3bb6bd94d8c49cb2ebc341b45b8bd852551b41da

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page