Neural Network Uzbek Morphological Analyzer (BiLSTM, CSE)
Project description
uzmorph-nn: Uzbek Neural Morphological Analyzer
uzmorph-nn is a high-accuracy word-level morphological analyzer for the Uzbek language based on a character-level Bidirectional LSTM (BiLSTM) architecture.
Role & Performance
This package provides a robust foundation for Uzbek NLP tasks. It is specifically optimized for:
- Agglutinative Processing: Efficiently decomposes long chains of suffixes.
- Phonological Awareness: Handles stem changes (allomorphy) and vowel/consonant harmony correctly.
- Rule-Augmented Deep Learning: Built on 100,000+ rules from the Common Stem Expansion (CSE) framework.
Installation
pip install uzmorph-nn
Quick Start & Usage Examples
1. Basic Analysis (String Output)
Great for quick debugging or readable logs.
from uzmorph_nn import uzmorph_nn
# Initialize the analyzer
analyzer = uzmorph_nn()
# Analyze a word
result = analyzer.analyze("maktabimizda")
print(result)
# Output:
# Result: 'maktabimizda' -> Stem: maktab | POS: NOUN | Tags: [possession=1, cases=Locative, plural=1]
2. Structured Data (Dictionary)
Ideal for integrating into other projects or data processing pipelines.
result = analyzer.analyze("kitobim")
data = result.to_dict()
print(data)
# Output:
# {
# "word": "kitobim",
# "stem": "kitob",
# "pos": "NOUN",
# "possession": "1"
# }
3. API Integration (JSON Output)
Useful for web services or sending data between different languages.
result = analyzer.analyze("yozayapmiz")
print(result.to_json())
# Output:
# {
# "word": "yozayapmiz",
# "stem": "yoz",
# "pos": "VERB",
# "aspect": "Progressive",
# "person": "1",
# "number": "Plural"
# }
4. Direct Attribute Access
Access specific parts of the analysis directly.
result = analyzer.analyze("olmalar")
print(f"Stem: {result.stem}") # Stem: olma
print(f"POS: {result.pos}") # POS: NOUN
print(f"Features: {result.features}") # Features: ['plural=1']
Architecture Details
- Input: Character sequence (UTF-8).
- Architecture: 2-layer Bidirectional LSTM.
- Tagging Strategy: BIO (Beginning, Inside, Outside) sequence labeling.
- Weights: Pre-trained on a comprehensive CSE rule-engine dataset.
License
MIT License. Free for academic and commercial use.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file uzmorph_nn-0.1.0.tar.gz.
File metadata
- Download URL: uzmorph_nn-0.1.0.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5a5f0196527726ddcbc33efe7abb38d18ce86e7b463157fea2708fdfdf4b54e
|
|
| MD5 |
271db68d3abc98e49d0b0e0115f64c07
|
|
| BLAKE2b-256 |
fe213b71f046950afc5e90970062232554363145c8b5572f18dedec30189b726
|
File details
Details for the file uzmorph_nn-0.1.0-py3-none-any.whl.
File metadata
- Download URL: uzmorph_nn-0.1.0-py3-none-any.whl
- Upload date:
- Size: 1.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cce2474cd810f91e9311ab50a214db7700a53c84d704b289ff6c9b8d1e67b3b3
|
|
| MD5 |
584fd198cdf1dd23ecbb6a78fc6c28ba
|
|
| BLAKE2b-256 |
85803ecaeb91b9c349506c901a7084ede16d4b74c22af4276c51087fb5a6a4b6
|