Permissive, rule-based Malayalam morphological synthesizer (noun inflection generation).
Project description
mlinflect
A permissive, rule-based Malayalam morphological synthesizer. It does forward morphological generation: given a root and grammatical features, it produces the inflected surface form (the counterpart to morphological analysis/segmentation).
from mlinflect import synthesize_noun, Case, Number
synthesize_noun("മരം", Case.LOCATIVE).surface # 'മരത്തിൽ'
synthesize_noun("മരം", Case.GENITIVE).surface # 'മരത്തിന്റെ'
synthesize_noun("കുട്ടി", Case.GENITIVE).surface # 'കുട്ടിയുടെ'
synthesize_noun("മരം", Case.NOMINATIVE, number=Number.PLURAL).surface # 'മരങ്ങൾ'
Why this exists
Existing Malayalam morphology tools are either copyleft (Apertium, libindic =
GPL/AGPL) or, in the case of the one permissive generator (mlmorph, MIT), built
on a GPL FST runtime. There is no permissive, dependency-clean, rule-based Malayalam
synthesizer. mlinflect aims to fill that gap with a small pure-Python rule engine
and no copyleft dependencies.
Design
- Declarative, provenance-tagged rules (
mlinflect/rules.py): each rule cites the source it was drawn from and carries averifiedflag that isTrueonly when the form has been ratified by a native reviewer. Adding or correcting a paradigm is a data edit, not a code change. - Inspectable results: every
synthesize_noun(...)returns aSynthResultwith thesurfaceform, themorphemesthat compose it, thestem_class, theprovenancekey, andverified. Feature combinations that are not yet encoded raise rather than return a silently wrong form. - Akshara-correct joins: suffixes are represented matra-initial so concatenation produces correct conjuncts/vowel signs; the genitive uses the canonical nta form (NA + virama + RRA).
Status
Alpha. Five ending-conditioned noun classes: am_neuter (മരം) and i_vowel (കുട്ടി)
are complete across 11 cases (singular and plural); vowel_anuswara (കലാം), u_vowel
(പശു), and ṭ_geminate (വീട്) are partial. Includes differential object marking and a
synthetic/colloquial register for the instrumental. Most forms are native-ratified
(verified=True); a few are SMC-sourced pending sign-off (verified=False). See
LIMITATIONS.md for exactly what is unsupported. Remaining noun
classes (a/e-stems, chillu), verbs, and pronouns are future work.
Install
pip install mlinflect # once published
# from source:
pip install -e ".[dev]"
License
Apache-2.0. See LICENSE and NOTICE. Contributions are accepted under Apache-2.0
§5 (inbound = outbound); no separate CLA is required.
The implemented linguistic rules are facts restated in our own code; no source's
text, tables, code, or data is reproduced. Sources are credited in REFERENCES.md as
scholarship; that implies no endorsement and creates no license obligation.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlinflect-0.0.1.tar.gz.
File metadata
- Download URL: mlinflect-0.0.1.tar.gz
- Upload date:
- Size: 15.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af3852d2b3671837d530a4f635cd95939dd46111b373e6caa9a2a38070677eff
|
|
| MD5 |
dd86ef67521752d96a5079ea4551cec0
|
|
| BLAKE2b-256 |
34abdc17d3d18381145bb4569c62331468de1bf4c9ce57d6fecba225cf4fbfb6
|
File details
Details for the file mlinflect-0.0.1-py3-none-any.whl.
File metadata
- Download URL: mlinflect-0.0.1-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b40cfafc799b53efc51b16249ba4e0c65e7273c8213ffb36ab7f2aeeb52e890
|
|
| MD5 |
5c92bbebf4d984c0ba6001d99dc52bea
|
|
| BLAKE2b-256 |
2087fa4b03e8fe50a549246ecc24624ce1e5ff7ea4f3fbce144d8d4401bce168
|