Uzbek Morphological Analyzer using Complete Set of Ending (CSE)
Project description
Uzbek Morphological Analyzer using Complete Set of Ending (CSE)
Project Description
uzmorph is a professional-grade morphological analyzer for the Uzbek language that combines a massive lexicon (~122k stems) with CSE (Complete Set of Endings) morphological rules. It supports robust suffix stripping and multi-POS disambiguation for high-accuracy linguistic analysis.
You can try the live demo here: Hugging Face Space
Key Features
- High Accuracy: Achieved 93% Word Coverage Accuracy on a sample of 20K unique Uzbek words.
- Massive Lexicon: Built with over 122K unique stem-POS pairs.
- Rule-Based CSE Engine: Implements the Complete Set of Endings paradigm for agglutinative suffix analysis.
- Multi-POS Support: Handles ambiguous words (e.g.,
otas both Noun "horse" and Verb "throw") by validating suffix rules against lexicon POS. - Rich Morphological Tagging: Extracts detailed features including part-of-speech (POS), tense, person, possession, case, and voice.
- Flat JSON Output: Returns analysis results in a developer-friendly, flattened JSON-compatible format.
Installation
pip install uzmorph
Quick Start
from uzmorph import UzMorph
# Initialize the analyzer
analyzer = UzMorph()
# Analyze a word
results = analyzer.analyze("maktabimda")
# Formatted console print
analyzer.print_result(results)
JSON Result Sample
Each analysis result is a dictionary containing the following structure:
[
{
"word": "maktabimda",
"stem": "maktab",
"lemma": "maktab",
"cse": "imda",
"cse_formula": "(i)mda",
"pos": "NOUN",
"possession": "1",
"cases": "Locative",
"singular": "1",
"syntactical_affixes": "(i)m da",
"note": null,
"ball": 308
}
]
API Reference
UzMorph Class
analyze(word, pos_filter=None): Performs morphological analysis and returns a list of results.print_result(results): Prints formatted output to the console.get_pos_list(): Returns a formatted string of all available POS tags.get_features_list(): Returns a list of all possible property keys in the result.
Project Structure (for PyPI)
For a clean PyPI contribution, only the following files are included in the package:
uzmorph/- Core package directory containing implementation and data files (root.csv,cse.csv, etc.)pyproject.toml- Package metadata and build configuration.LICENSE- MIT License file.README.md- Documentation.MANIFEST.in- File inclusion rules for the source distribution.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file uzmorph-1.2.2.tar.gz.
File metadata
- Download URL: uzmorph-1.2.2.tar.gz
- Upload date:
- Size: 474.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90267fa0c1c7f671d7dbf5166354e630301d8cf5dbcdce732835e234662d74f1
|
|
| MD5 |
5885b6c38d555366e47458250026e7b7
|
|
| BLAKE2b-256 |
57206767d3f8e21b9aa3aa120ffcbf584d58daec4683be8a2b3997113d01714c
|
File details
Details for the file uzmorph-1.2.2-py3-none-any.whl.
File metadata
- Download URL: uzmorph-1.2.2-py3-none-any.whl
- Upload date:
- Size: 491.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb8e9a8ea186c3ffbf4ed7fe4fd5ecabe590a83f22697ea99217292ffc752d08
|
|
| MD5 |
b48729aaffc098d0def5bc06d2c6c2fe
|
|
| BLAKE2b-256 |
c21bb1a2a5f8fbf1fd4a1e92dd528411bd8ecc9c081c05742c909e381b790360
|