Skip to main content

Uzbek Morphological Analyzer using Complete Set of Ending (CSE)

Project description

Uzbek Morphological Analyzer using Complete Set of Ending (CSE)

GitHub Hugging Face PyPI Website

Project Description

uzmorph is a professional-grade morphological analyzer for the Uzbek language that combines a massive lexicon (~122k stems) with CSE (Complete Set of Endings) morphological rules. It supports robust suffix stripping and multi-POS disambiguation for high-accuracy linguistic analysis.

You can try the live demo here: Hugging Face Space

Key Features

  • High Accuracy: Achieved 93% Word Coverage Accuracy on a sample of 20K unique Uzbek words.
  • Massive Lexicon: Built with over 122K unique stem-POS pairs.
  • Rule-Based CSE Engine: Implements the Complete Set of Endings paradigm for agglutinative suffix analysis.
  • Multi-POS Support: Handles ambiguous words (e.g., ot as both Noun "horse" and Verb "throw") by validating suffix rules against lexicon POS.
  • Rich Morphological Tagging: Extracts detailed features including part-of-speech (POS), tense, person, possession, case, and voice.
  • Flat JSON Output: Returns analysis results in a developer-friendly, flattened JSON-compatible format.

Installation

pip install uzmorph

Quick Start

from uzmorph import UzMorph

# Initialize the analyzer
analyzer = UzMorph()

# Analyze a word
results = analyzer.analyze("maktabimda")

# Formatted console print
analyzer.print_result(results)

JSON Result Sample

Each analysis result is a dictionary containing the following structure:

[
    {
        "word": "maktabimda",
        "stem": "maktab",
        "lemma": "maktab",
        "cse": "imda",
        "cse_formula": "(i)mda",
        "pos": "NOUN",
        "possession": "1",
        "cases": "Locative",
        "singular": "1",
        "syntactical_affixes": "(i)m da",
        "note": null,
        "ball": 308
    }
]

API Reference

UzMorph Class

  • analyze(word, pos_filter=None): Performs morphological analysis and returns a list of results.
  • print_result(results): Prints formatted output to the console.
  • get_pos_list(): Returns a formatted string of all available POS tags.
  • get_features_list(): Returns a list of all possible property keys in the result.

Project Structure (for PyPI)

For a clean PyPI contribution, only the following files are included in the package:

  • uzmorph/ - Core package directory containing implementation and data files (root.csv, cse.csv, etc.)
  • pyproject.toml - Package metadata and build configuration.
  • LICENSE - MIT License file.
  • README.md - Documentation.
  • MANIFEST.in - File inclusion rules for the source distribution.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uzmorph-1.2.2.tar.gz (474.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uzmorph-1.2.2-py3-none-any.whl (491.5 kB view details)

Uploaded Python 3

File details

Details for the file uzmorph-1.2.2.tar.gz.

File metadata

  • Download URL: uzmorph-1.2.2.tar.gz
  • Upload date:
  • Size: 474.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for uzmorph-1.2.2.tar.gz
Algorithm Hash digest
SHA256 90267fa0c1c7f671d7dbf5166354e630301d8cf5dbcdce732835e234662d74f1
MD5 5885b6c38d555366e47458250026e7b7
BLAKE2b-256 57206767d3f8e21b9aa3aa120ffcbf584d58daec4683be8a2b3997113d01714c

See more details on using hashes here.

File details

Details for the file uzmorph-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: uzmorph-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 491.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for uzmorph-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fb8e9a8ea186c3ffbf4ed7fe4fd5ecabe590a83f22697ea99217292ffc752d08
MD5 b48729aaffc098d0def5bc06d2c6c2fe
BLAKE2b-256 c21bb1a2a5f8fbf1fd4a1e92dd528411bd8ecc9c081c05742c909e381b790360

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page