A high-performance Serbian stemming library supporting both Cyrillic and Latin scripts (Ekavica).
Project description
Serb-Stem 🇷🇸⚡
Serb-Stem je munjevito brz, algoritamski stemmer za srpski jezik, pisan u Rust-u. Dizajniran za maksimalne performanse u NLP zadacima, pretraživanju i analizi teksta, Serb-Stem nudi potpunu podršku za oba pisma (ćirilica i latinica) uz naprednu ekavizaciju.
✨ Ključne Karakteristike
- 🚀 Ekstremne Performanse: Napisan u Rust-u, obrađuje preko 100,000 reči u milisekundi.
- 🔡 Dual-Script podrška: Automatski prepoznaje i obrađuje i ćirilicu i latinicu.
- 🌍 Ekavizacija: Inteligentna normalizacija ijekavskih oblika u ekavske radi preciznijeg pretraživanja (npr. mlijeko -> mlek).
- 🏗️ Više-platformski:
- Rust: Native performanse kao biblioteka.
- Python: Jednostavna integracija putem
PyO3bindinga. - WebAssembly: Pokretanje direktno u browseru (punokrvni Web portal uključen).
- 🛡️ Type-Safe: Maksimalna memorijska sigurnost bez žrtvovanja brzine.
📊 Performanse i Tačnost
Na osnovu testiranja na validiranom korpusu od 182 reči:
- Tačnost:
98.35% - Brzina:
< 1µspo reči (ekstremna niska latencija) - Veličina: WASM binarni fajl je manji od
120KB.
🛠️ Instalacija i Korišćenje
🐍 Python
pip install serb-stem
import serb_stem
# Latino ulaz
print(serb_stem.stem_py("knjigama")) # Output: "knjig"
# Ćirilični ulaz
print(serb_stem.stem_py("књигама")) # Output: "књиг"
# Ekavizacija (mlijeko -> mleko -> mlek)
print(serb_stem.stem_py("mlijeka")) # Output: "mlek"
🦀 Rust
[dependencies]
serb_stem = "0.1.0"
use serb_stem::stem;
fn main() {
let result = stem("učenici");
assert_eq!(result, "učenik");
}
🌐 Interaktivni Demo
U okviru projekta nalazi se i /portal (React + Vite + WASM) koji omogućava testiranje stemmera direktno u vašem browseru uz vizuelni prikaz rezultata i vremena obrade.
📜 Licenca
Ovaj projekat je licenciran pod AGPL-3.0 licencom.
Developed with ❤️ by Ja1Denis & Antigravity AI
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file serb_stem-0.1.1.tar.gz.
File metadata
- Download URL: serb_stem-0.1.1.tar.gz
- Upload date:
- Size: 248.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a22d69b99680aa6a4af59ff9e7a65a4e10697ef2292a9b3f245f83a5e0aebc00
|
|
| MD5 |
4f800b25907b4e3e76574abf5b54267a
|
|
| BLAKE2b-256 |
f67e4e377c20303f6b5087afe4b3951fbc47c633f2f4c216a43116e28ac285fe
|
File details
Details for the file serb_stem-0.1.1-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: serb_stem-0.1.1-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 105.3 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b6fd3c65ed05ec8634490af6489db0f1d12b2664a0f8c5bdef2fedf99bcb4b6
|
|
| MD5 |
75da6fa48d829330615306d34d8daf89
|
|
| BLAKE2b-256 |
2d2de3b476dcbcb056e37f21b73364f66be2b1980c64e9e017de47c6f7583ca8
|