Skip to main content

Neural Network Uzbek Morphological Analyzer (BiLSTM, CSE)

Project description

uzmorph-nn: Uzbek Neural Morphological Analyzer

uzmorph-nn is a high-accuracy word-level morphological analyzer for the Uzbek language based on a character-level Bidirectional LSTM (BiLSTM) architecture.

Role & Performance

This package provides a robust foundation for Uzbek NLP tasks. It is specifically optimized for:

  • Agglutinative Processing: Efficiently decomposes long chains of suffixes.
  • Phonological Awareness: Handles stem changes (allomorphy) and vowel/consonant harmony correctly.
  • Rule-Augmented Deep Learning: Built on 100,000+ rules from the Common Stem Expansion (CSE) framework.

Installation

pip install uzmorph-nn

Quick Start & Usage Examples

1. Basic Analysis (String Output)

Great for quick debugging or readable logs.

from uzmorph_nn import uzmorph_nn

# Initialize the analyzer
analyzer = uzmorph_nn()

# Analyze a word
result = analyzer.analyze("maktabimizda")
print(result)

# Output:
# Result: 'maktabimizda' -> Stem: maktab | POS: NOUN | Tags: [possession=1, cases=Locative, plural=1]

2. Structured Data (Dictionary)

Ideal for integrating into other projects or data processing pipelines.

result = analyzer.analyze("kitobim")
data = result.to_dict()
print(data)

# Output:
# {
#   "word": "kitobim",
#   "stem": "kitob",
#   "pos": "NOUN",
#   "possession": "1"
# }

3. API Integration (JSON Output)

Useful for web services or sending data between different languages.

result = analyzer.analyze("yozayapmiz")
print(result.to_json())

# Output:
# {
#   "word": "yozayapmiz",
#   "stem": "yoz",
#   "pos": "VERB",
#   "aspect": "Progressive",
#   "person": "1",
#   "number": "Plural"
# }

4. Direct Attribute Access

Access specific parts of the analysis directly.

result = analyzer.analyze("olmalar")

print(f"Stem: {result.stem}")       # Stem: olma
print(f"POS: {result.pos}")         # POS: NOUN
print(f"Features: {result.features}") # Features: ['plural=1']

Architecture Details

  • Input: Character sequence (UTF-8).
  • Architecture: 2-layer Bidirectional LSTM.
  • Tagging Strategy: BIO (Beginning, Inside, Outside) sequence labeling.
  • Weights: Pre-trained on a comprehensive CSE rule-engine dataset.

License

MIT License. Free for academic and commercial use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uzmorph_nn-0.1.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uzmorph_nn-0.1.0-py3-none-any.whl (1.0 MB view details)

Uploaded Python 3

File details

Details for the file uzmorph_nn-0.1.0.tar.gz.

File metadata

  • Download URL: uzmorph_nn-0.1.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for uzmorph_nn-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c5a5f0196527726ddcbc33efe7abb38d18ce86e7b463157fea2708fdfdf4b54e
MD5 271db68d3abc98e49d0b0e0115f64c07
BLAKE2b-256 fe213b71f046950afc5e90970062232554363145c8b5572f18dedec30189b726

See more details on using hashes here.

File details

Details for the file uzmorph_nn-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: uzmorph_nn-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for uzmorph_nn-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cce2474cd810f91e9311ab50a214db7700a53c84d704b289ff6c9b8d1e67b3b3
MD5 584fd198cdf1dd23ecbb6a78fc6c28ba
BLAKE2b-256 85803ecaeb91b9c349506c901a7084ede16d4b74c22af4276c51087fb5a6a4b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page