A tiny, zero-dependency collection of English stemmers (Porter, Snowball, Lancaster).
Project description
stemlite
A tiny, zero-dependency collection of English stemmers — standard library
only, no nltk/numpy/anything. Pure-Python implementations of the classic
Porter, Snowball (Porter2), and Lancaster stemming algorithms,
plus a small registry so you can pick one by name.
It exists to be a single, self-contained, stable dependency shared across
minsearch,
zerosearch, and
sqlitesearch — so those search libraries can normalize words the same way
without each carrying its own copy or pulling in a heavyweight NLP stack.
Designed to run anywhere Python runs, including constrained environments like
Cloudflare Python Workers (Pyodide).
Note: these are pragmatic, simplified implementations tuned for search-time normalization, not bit-for-bit reference implementations of the published algorithms.
Install
pip install stemlite
Usage
from stemlite import get_stemmer, porter_stemmer, snowball_stemmer, lancaster_stemmer, STEMMERS
porter_stemmer("running") # -> "run"
snowball_stemmer("running") # -> "run"
lancaster_stemmer("running") # -> "run"
# Pick a stemmer by name (case-insensitive).
stem = get_stemmer("porter")
stem("running") # -> "run"
# None (or an unknown name) returns a no-op stemmer that only lowercases.
noop = get_stemmer(None)
noop("Running") # -> "running"
get_stemmer(name)
def get_stemmer(name: Optional[str] = None) -> Callable[[str], str]: ...
Accepts "porter", "snowball", "lancaster", "none", or None, and
returns a Callable[[str], str]. An unknown name (or None) falls back to the
no-op stemmer, which just lowercases the input. Names are matched
case-insensitively.
STEMMERS
The registry backing get_stemmer, a Dict[str, Callable[[str], str]] keyed by
"porter", "snowball", "lancaster", and "none".
Choosing a stemmer
- Porter — the classic, conservative choice. Good default.
- Snowball (Porter2) — a refinement of Porter; slightly different handling of edge cases.
- Lancaster — the most aggressive; stems words down harder (higher recall, more collisions).
Development
make setup # uv sync --dev
make test # run the test suite
make coverage # run tests with coverage
License
WTFPL.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stemlite-0.1.0.tar.gz.
File metadata
- Download URL: stemlite-0.1.0.tar.gz
- Upload date:
- Size: 87.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
062d44a1314af379ea6a09f8467d07a776407e456c9932e4e182ef8cfb3948d5
|
|
| MD5 |
01e7fd469bda543be480bad61048d235
|
|
| BLAKE2b-256 |
ea4e4e5ae5f473c0282e76bc3b4966614dc6c72cc6458435e559597b5421ff55
|
File details
Details for the file stemlite-0.1.0-py3-none-any.whl.
File metadata
- Download URL: stemlite-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0488b4bb5fc38d5796a0d627bb35e99c2a47d9f1b49ffea5400b52099eca5b8
|
|
| MD5 |
d518b9d54605cddfc698bf8d97397517
|
|
| BLAKE2b-256 |
024da71e4324552914bb279f3b28b9dbdf7b1fcace463146f38a49ca076b385d
|