Skip to main content

A high-performance Serbian stemming library supporting both Cyrillic and Latin scripts (Ekavica).

Project description

Serb-Stem 🇷🇸⚡

Rust Python WebAssembly License

Serb-Stem je munjevito brz, algoritamski stemmer za srpski jezik, pisan u Rust-u. Dizajniran za maksimalne performanse u NLP zadacima, pretraživanju i analizi teksta, Serb-Stem nudi potpunu podršku za oba pisma (ćirilica i latinica) uz naprednu ekavizaciju.

✨ Ključne Karakteristike

  • 🚀 Ekstremne Performanse: Napisan u Rust-u, obrađuje preko 100,000 reči u milisekundi.
  • 🔡 Dual-Script podrška: Automatski prepoznaje i obrađuje i ćirilicu i latinicu.
  • 🌍 Ekavizacija: Inteligentna normalizacija ijekavskih oblika u ekavske radi preciznijeg pretraživanja (npr. mlijeko -> mlek).
  • 🏗️ Više-platformski:
    • Rust: Native performanse kao biblioteka.
    • Python: Jednostavna integracija putem PyO3 bindinga.
    • WebAssembly: Pokretanje direktno u browseru (punokrvni Web portal uključen).
  • 🛡️ Type-Safe: Maksimalna memorijska sigurnost bez žrtvovanja brzine.

📊 Performanse i Tačnost

Na osnovu testiranja na validiranom korpusu od 182 reči:

  • Tačnost: 98.35%
  • Brzina: < 1µs po reči (ekstremna niska latencija)
  • Veličina: WASM binarni fajl je manji od 120KB.

🛠️ Instalacija i Korišćenje

🐍 Python

pip install serb-stem
import serb_stem

# Latino ulaz
print(serb_stem.stem_py("knjigama"))  # Output: "knjig"

# Ćirilični ulaz
print(serb_stem.stem_py("књигама"))  # Output: "књиг"

# Ekavizacija (mlijeko -> mleko -> mlek)
print(serb_stem.stem_py("mlijeka"))   # Output: "mlek"

🦀 Rust

[dependencies]
serb_stem = "0.1.0"
use serb_stem::stem;

fn main() {
    let result = stem("učenici");
    assert_eq!(result, "učenik");
}

🌐 Interaktivni Demo

U okviru projekta nalazi se i /portal (React + Vite + WASM) koji omogućava testiranje stemmera direktno u vašem browseru uz vizuelni prikaz rezultata i vremena obrade.

📜 Licenca

Ovaj projekat je licenciran pod AGPL-3.0 licencom.


Developed with ❤️ by Ja1Denis & Antigravity AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

serb_stem-0.1.1.tar.gz (248.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

serb_stem-0.1.1-cp311-cp311-win_amd64.whl (105.3 kB view details)

Uploaded CPython 3.11Windows x86-64

File details

Details for the file serb_stem-0.1.1.tar.gz.

File metadata

  • Download URL: serb_stem-0.1.1.tar.gz
  • Upload date:
  • Size: 248.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for serb_stem-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a22d69b99680aa6a4af59ff9e7a65a4e10697ef2292a9b3f245f83a5e0aebc00
MD5 4f800b25907b4e3e76574abf5b54267a
BLAKE2b-256 f67e4e377c20303f6b5087afe4b3951fbc47c633f2f4c216a43116e28ac285fe

See more details on using hashes here.

File details

Details for the file serb_stem-0.1.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for serb_stem-0.1.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 1b6fd3c65ed05ec8634490af6489db0f1d12b2664a0f8c5bdef2fedf99bcb4b6
MD5 75da6fa48d829330615306d34d8daf89
BLAKE2b-256 2d2de3b476dcbcb056e37f21b73364f66be2b1980c64e9e017de47c6f7583ca8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page