Skip to main content

A modified Porter stemmer for verbs and other additional rules.

Project description

What is stemming?

Stemming is a technique in Natural Language Processing that reduces various inflected forms of a word to a single invariant root form. This root form, known as the stem, may or may not be identical to the word's morphological root.

What is it good for?

Stemming is highly useful in various applications, with query expansion in information retrieval being a prime example. For instance, in a search engine, if a user searches for "cat," it would be beneficial for the search to return documents containing the word "cats" as well. This won't happen unless both the query and the document index undergo stemming. Essentially, stemming reduces the specificity of queries, enabling the retrieval of more relevant results, though this involves a trade-off.

What type of stemmer is this?

modifiedstemmer is a suffix-stripping stemmer, which means it transforms words into stems by applying a predetermined sequence of changes to the word's suffix. Other stemmers may function differently, such as by using a lookup table to map inflected forms to their roots or by employing clustering techniques to group various forms around a central form. Each approach comes with its own set of pros and cons. modifiedstemmer, specifically, is a modified version of the original Porter stemmer and includes more comprehensive rules for handling verbs and suffixes.

How do I use it?

Using the modifiedstemmer is straightforward. Simply import the stemmer, create an instance, and use it to stem words:

from modifiedstemmer import stemmer
stemmer = stemmer()
print(stemmer.stem('consistent'))

This process will convert the word 'consistent' to its stem form.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modifiedstemmer-0.0.10.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

modifiedstemmer-0.0.10-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file modifiedstemmer-0.0.10.tar.gz.

File metadata

  • Download URL: modifiedstemmer-0.0.10.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.1

File hashes

Hashes for modifiedstemmer-0.0.10.tar.gz
Algorithm Hash digest
SHA256 1be469b8be99527963d43f32c7733310f538a60f424cd59286b9e96ce942dcb2
MD5 90e6fb3a3ff50610f7c3882ffcb831e0
BLAKE2b-256 12cc53c9da747d06563eecedf5fda488a18acc8d6f55f72a45a950e65130f210

See more details on using hashes here.

Provenance

File details

Details for the file modifiedstemmer-0.0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for modifiedstemmer-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 ccf3589a925717140bedb6b80007794e1dcc2cf4de149151f91f22972dcec06b
MD5 66cff3a3f178f33a782687cb2d8cdcbd
BLAKE2b-256 71a3fc282f7dee87a50673eebe279ec73dfb13cbdfdb547965e845b6a8d42885

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page