A lightweight, fast English lemmatizer

These details have not been verified by PyPI

Project links

Project description

LightLemma

A lightweight, fast English lemmatizer and stemmer. LightLemma focuses on providing high-performance text normalization for English text while maintaining a minimal footprint.

Introduction to Lemmatization

Lemmatization is the process of reducing words to their base or dictionary form (lemma). This process uses morphological analysis and dictionary lookups to transform words into their canonical forms. For example:

"running" → "run"
"better" → "good"
"studies" → "study"
"am", "are", "is" → "be"

Unlike stemming, lemmatization considers the context and part of speech of words to produce linguistically valid results. It uses a dictionary-based approach to ensure the output is always a real word.

The Difference Between Lemmatization and Stemming

While both lemmatization and stemming aim to reduce words to their base form, they work differently:

Lemmatization:

Produces linguistically valid words
Uses dictionary lookup and morphological analysis
Considers word context and part of speech
More accurate but typically slower
Example: "studies" → "study"

Stemming:

Uses rule-based algorithms to strip affixes
Faster but can produce non-words
Doesn't consider word context
More aggressive reduction
Example: "studies" → "studi"

Choose lemmatization when you need linguistically accurate results, and stemming when you need fast, approximate word normalization.

Features

Fast and lightweight English lemmatization
Porter Stemmer implementation
Simple, easy-to-use API
No external dependencies
Optimized for performance
Future integration with contraction_fix and emoticon_fix

Installation

pip install lightlemma

Usage

from lightlemma import lemmatize, stem

# Simple word lemmatization
word = "running"
lemma = lemmatize(word)
print(lemma)  # Output: "run"

# Process multiple words with lemmatization
words = ["cats", "running", "better", "studies"]
lemmas = [lemmatize(word) for word in words]
print(lemmas)  # Output: ["cat", "run", "good", "study"]

# Using the Porter Stemmer
word = "running"
stemmed = stem(word)
print(stemmed)  # Output: "run"

# Compare lemmatization vs stemming
words = ["studies", "universal", "maximum"]
lemmas = [lemmatize(word) for word in words]
stems = [stem(word) for word in words]
print(lemmas)  # Output: ["study", "universal", "maximum"]
print(stems)   # Output: ["studi", "univers", "maxim"]

Performance

LightLemma is designed to be faster and more memory-efficient than existing solutions while maintaining high accuracy for English text.

Future Features

Integration with contraction_fix for handling contractions
Integration with emoticon_fix for emoticon normalization
Support for additional text normalization features

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.6

Aug 7, 2025

0.1.5

Jul 17, 2025

0.1.4

Jul 17, 2025

0.1.3

Jul 17, 2025

This version

0.1.2

Apr 14, 2025

0.1.1

Apr 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightlemma-0.1.2.tar.gz (12.8 kB view details)

Uploaded Apr 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lightlemma-0.1.2-py3-none-any.whl (10.7 kB view details)

Uploaded Apr 14, 2025 Python 3

File details

Details for the file lightlemma-0.1.2.tar.gz.

File metadata

Download URL: lightlemma-0.1.2.tar.gz
Upload date: Apr 14, 2025
Size: 12.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for lightlemma-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`b4f3548bc7f723ccb42f4743ff003ad15b42bafbf7ae7dbe1128ada006744042`
MD5	`8f0a1ec4853d052cc6876bd84fe443aa`
BLAKE2b-256	`8b4121f9b5ad872c3745eeebd268ced68cb7eeb8fb0f347a89ce3f562c680bc5`

See more details on using hashes here.

File details

Details for the file lightlemma-0.1.2-py3-none-any.whl.

File metadata

Download URL: lightlemma-0.1.2-py3-none-any.whl
Upload date: Apr 14, 2025
Size: 10.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for lightlemma-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`131ac9153e32178e7b9d5798ae43f59f7d4d6a0fddb5e71ae6b4e05315db0e2b`
MD5	`5526a1488ed116788f914d16c255a412`
BLAKE2b-256	`3da1f9f62548a89ef7835c48f457dcb2e088a0218d7766a924ba36b663e1dfed`

See more details on using hashes here.

lightlemma 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LightLemma

Introduction to Lemmatization

The Difference Between Lemmatization and Stemming

Features

Installation

Usage

Performance

Future Features

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes