Skip to main content

Fast and parallel snowball stemmer

Project description

py-rust-stemmers

py-rust-stemmers is a high-performance Python wrapper around the rust-stemmers library, utilizing the Snowball stemming algorithm. This library allows for efficient stemming of words with support for parallel processing, making it a powerful tool for text processing tasks. The library is built using maturin to compile the Rust code into a Python package.

Features

  • Snowball Stemmer: Uses the well-known Snowball stemming algorithms for efficient word stemming in multiple languages.
  • Parallelism Support: Offers parallel processing for batch stemming, providing significant speedup for larger text sequences.
  • Rust Performance: Leverages the performance of Rust for fast, reliable text processing.

Installation

You can install py-rust-stemmers via pip:

pip install py-rust-stemmers

Usage

Here's a simple example showing how to use py-rust-stemmers to stem words using the Snowball algorithm:

from py_rust_stemmers import SnowballStemmer

# Initialize the stemmer for the English language
s = SnowballStemmer('english')

# Input text
text = """This stem form is often a word itself, but this is not always the case as this is not a requirement for text search systems, which are the intended field of use. We also aim to conflate words with the same meaning, rather than all words with a common linguistic root (so awe and awful don't have the same stem), and over-stemming is more problematic than under-stemming so we tend not to stem in cases that are hard to resolve. If you want to always reduce words to a root form and/or get a root form which is itself a word then Snowball's stemming algorithms likely aren't the right answer."""
words = text.split()

# Example usage of the methods
stemmed = s.stem_word(words[0])
print(f"Stemmed word: {stemmed}")

# Stem a list of words
stemmed_words = s.stem_words(words)
print(f"Stemmed words: {stemmed_words}")

# Stem words in parallel
stemmed_words_parallel = s.stem_words_parallel(words)
print(f"Stemmed words (parallel): {stemmed_words_parallel}")

Methods

stem_word(word: str) -> str

This method stems a single word. It is best used for small or isolated stemming tasks.

Example:

s.stem_word("running")  # Output: "run"

stem_words(words: List[str]) -> List[str]

This method stems a list of words sequentially. It is ideal for processing short to moderately sized text sequences.

Example:

s.stem_words(["running", "jumps", "easily"])  # Output: ["run", "jump", "easili"]

stem_words_parallel(words: List[str]) -> List[str]

This method stems a list of words in parallel. It provides significant speedup for longer text sequences (e.g., sequences longer than 512 tokens) by utilizing parallel processing. It is ideal for batch processing of large datasets.

Example:

s.stem_words_parallel(["running", "jumps", "easily"])  # Output: ["run", "jump", "easili"]

Build from source

  • Install maturin
  • Go to project dir
maturin build --release
pip install target/wheels/py_rust_stemmers-<your os/architecture/etc>.whl

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_rust_stemmers-0.1.1.tar.gz (7.9 kB view details)

Uploaded Source

Built Distributions

py_rust_stemmers-0.1.1-cp312-none-win_amd64.whl (209.0 kB view details)

Uploaded CPython 3.12 Windows x86-64

py_rust_stemmers-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl (323.4 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.34+ x86-64

py_rust_stemmers-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (314.1 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

py_rust_stemmers-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (274.2 kB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

py_rust_stemmers-0.1.1-cp311-none-win_amd64.whl (209.0 kB view details)

Uploaded CPython 3.11 Windows x86-64

py_rust_stemmers-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl (323.4 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.34+ x86-64

py_rust_stemmers-0.1.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (314.1 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

py_rust_stemmers-0.1.1-cp311-cp311-macosx_11_0_arm64.whl (274.2 kB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

py_rust_stemmers-0.1.1-cp310-none-win_amd64.whl (209.0 kB view details)

Uploaded CPython 3.10 Windows x86-64

py_rust_stemmers-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl (323.4 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.34+ x86-64

py_rust_stemmers-0.1.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (314.1 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

py_rust_stemmers-0.1.1-cp310-cp310-macosx_11_0_arm64.whl (274.2 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

py_rust_stemmers-0.1.1-cp39-none-win_amd64.whl (209.0 kB view details)

Uploaded CPython 3.9 Windows x86-64

py_rust_stemmers-0.1.1-cp39-cp39-manylinux_2_34_x86_64.whl (323.4 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.34+ x86-64

py_rust_stemmers-0.1.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (314.1 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

py_rust_stemmers-0.1.1-cp39-cp39-macosx_11_0_arm64.whl (274.2 kB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

py_rust_stemmers-0.1.1-cp38-none-win_amd64.whl (209.2 kB view details)

Uploaded CPython 3.8 Windows x86-64

py_rust_stemmers-0.1.1-cp38-cp38-manylinux_2_34_x86_64.whl (323.6 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.34+ x86-64

py_rust_stemmers-0.1.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (314.3 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

py_rust_stemmers-0.1.1-cp38-cp38-macosx_11_0_arm64.whl (274.5 kB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

File details

Details for the file py_rust_stemmers-0.1.1.tar.gz.

File metadata

  • Download URL: py_rust_stemmers-0.1.1.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.4

File hashes

Hashes for py_rust_stemmers-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c383c6b7d59f25472ecaccdeae2c8b0169c6b71128bc3a5fd93741f402401e50
MD5 1519df7ca4fc63c97113f747b88bed7b
BLAKE2b-256 67389a8ce3c142f364971f6101b6de6821263ad04c37a24b7a2b60d47ee4e0f8

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp312-none-win_amd64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp312-none-win_amd64.whl
Algorithm Hash digest
SHA256 bed3b0750700a6d388b04af6c0dbbb1d90199ffcd90e0c61fe6ef61a8dc10d2b
MD5 95df65c5fab2570d9ea83cbfd13250dd
BLAKE2b-256 014ddea920b93649626ba01987ddfb94f836acc949eaa2b37b99f9edee262dbd

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 0b07712fc41cd7fecd48cf801268bde8b1381892face1e2e5be72b9f688ffffa
MD5 e7756eae090c77fb24210fc0e60ad582
BLAKE2b-256 4033b2432059de60863ba8205347a6d30442668c196143b93e67123f7396d03a

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7ed2c42afd7fd6303e1fa63bf751fde5e9e07f0d9c9b9726d7bd1cc7243a30ff
MD5 fcfcbc216f47cffe24954f9c1741cafa
BLAKE2b-256 9d120c23108b393d08e32208bfa50c24f736c2e2b4ee6c8948622cda879bcf70

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b98c8c04b56411f03662ba2933123fff7e04c31518e09df3cd13a97fef10262a
MD5 1a6318c2f20960a3b717aee4f0b8e352
BLAKE2b-256 4a3ab5777d09637a0b3f158a00a57fb981a1496f069735a76e92807d3d6a61d2

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp311-none-win_amd64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 1409d57f8d0a0514ff42043c0ceef5d70acf4f182a7858997f1fb7595a918270
MD5 24583af9ba3700d3d9e149ce9847e637
BLAKE2b-256 baf089c278e38f3303084013cd4abc33582b7164015cd361eedb793921d6ff0d

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 b7384aaefd75f877617603ec51348dbd6ff3a8ec2de57c9a9c62c0491c07199b
MD5 d0a65044fb20bc576447bbca21fdff5c
BLAKE2b-256 55bfdc67294b82058b45cd4fa0ebe77648d19f4b31513998fed9a3309bb69e76

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 21d94c3e010f536223c42505361ebd86eb59f66cff6d01c21d31909f28dbd07b
MD5 103e3786fdc6e5f7435a4dbc27b01375
BLAKE2b-256 d733d8db32fb1f90543d8e247eab65eeae9e5315d2b3308867b5b81497425f6e

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 63d9bb445bfe2ef03f558554a6de27d6a163f11433b2b361303d17f577080c57
MD5 7fa96ade1e26cdf49df072f7c4690c6b
BLAKE2b-256 658b973cee6b36e44a15aa1cfc4eb7269657de05ee9a792dd9ae711bb19e2ba5

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp310-none-win_amd64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 3ddf1c8abaa96f4ad4d6bcd56dd76ae4c38c717a017806e973bd062191ce45f1
MD5 70ea7d87427937c570c7ded16760b01a
BLAKE2b-256 e2fd9af09aa4c1145ef06f0d3911d1e22edc9f9da3ddde84f28b7891ac667d88

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 ff8b46957d149761742ded813a9f37e72aae909dd7e8505d37a4044376c0d537
MD5 4e7bcff22dd754c0511990ff423890b0
BLAKE2b-256 8d96b8d27f14d56ac1285440d2d8e9f9ae47e697f36def30fc191a2c8ec056bb

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 22c85114383916983bdec0751d6b5f2543109ab60e8798a1ffe941fac5ee8687
MD5 87d3e40f1a3003ee49c0f80fb6905dcf
BLAKE2b-256 8f79f4405b82097c301af501e6aa74f0e425b1e8263260d795cff864259b2651

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bb29770e8a3d14923685670872e207d200e28dd7292913b2ec4da823d09e1c90
MD5 80aa7366b6a31d90f1959d17078294b4
BLAKE2b-256 5eb6d0a8081af2a98bbcced1f39b2af1b63fbf1a4d2e90992928369bfb426a7c

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp39-none-win_amd64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 a5932b86050009b9dbff3bacf234c86d92dddd90136fae3eb1726cea794a1936
MD5 c8c9a0bb885ba3116ed310b71fa88c2a
BLAKE2b-256 2526eade4b9bd4218e87ab4cbc03a824f391fb5a937c45851a72d962b90ddcac

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 c40cd53fe44f9255f823f921257534d7baef40767fc40272341730144ab6fe32
MD5 b5f342f24a07be598a2dca221bb0b140
BLAKE2b-256 f4b54ff8bef1b235ca60c3fbffe659555def953d5837d46f6ff8c857d336876b

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2d3f0f28eb37ffe8abf3dca529195f1f034ff374ae92e778d953ec9b0ed024a2
MD5 afc8131472c52627fde1418cff4c150c
BLAKE2b-256 b1f4a0f04d61174ffa3f2eebf505f56bf9eb7fc5fbe170196b3baea1640857b2

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3dddda52d792a9e5109497713bc742d3e77dc14f30bd1b5b9790fa08d516bc70
MD5 dbc53e5edbc6108af39847ea49396658
BLAKE2b-256 e901c905e9ea9fee83e93176a844f2ac2a9a103ebb441cf6a2cdea0ffa13384c

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp38-none-win_amd64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp38-none-win_amd64.whl
Algorithm Hash digest
SHA256 85a8559e2a25767d0f1ae54f4a92d33fcb692bb10289bcd5e8a599f157af139c
MD5 911d09874796d7c7e6138b9983aac6b4
BLAKE2b-256 732e0029a4189cb6b9813837c75fe44513c08ae5c47a17d537f7120d46d03295

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp38-cp38-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp38-cp38-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 cc39fbdd03d0585b076237bf6a152c7e9f7233e82af252b51a2804a079a049ac
MD5 4c154fe0182241f5ddeb1de47253d286
BLAKE2b-256 ae2974542baa5881e5f9a87e51d0381e5affb15e8eca7f45458a5309d91b5cbf

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d527764f023aff64b8f4f644dacce203d59302ffbe3165036ce5bbf71b3c4e7b
MD5 5368d85d24d34004b9126d794ee58feb
BLAKE2b-256 6fc5ce141b3d48cc9e661a7d343b2224b9b77756683a6e7201c4001ce8da8c2a

See more details on using hashes here.

File details

Details for the file py_rust_stemmers-0.1.1-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for py_rust_stemmers-0.1.1-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 19ab2d7caec20e81fad8bb523956e5d81977093c679aa7c23cd8a1c12a40f26a
MD5 e2242e2556aa2fd1c9c1fa894698c4b9
BLAKE2b-256 9fc1199f1d980a5b5af0b197819d1015c904b6ecf3a07cbfe73cecb0490f81b4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page