Rule-based Malayalam morphological synthesizer (noun and verb inflection generation).
Project description
mlinflect
A rule-based Malayalam morphological synthesizer. It does forward morphological generation: given a root and grammatical features, it produces the inflected surface form (the counterpart to morphological analysis/segmentation).
from mlinflect import synthesize_noun, synthesize_verb, with_clitic, Case, Clitic, Number, VerbForm
synthesize_noun("മരം", Case.LOCATIVE).surface # 'മരത്തിൽ'
synthesize_noun("മരം", Case.GENITIVE).surface # 'മരത്തിന്റെ'
synthesize_noun("കുട്ടി", Case.GENITIVE).surface # 'കുട്ടിയുടെ'
synthesize_noun("മരം", Case.NOMINATIVE, number=Number.PLURAL).surface # 'മരങ്ങൾ'
with_clitic(synthesize_noun("കുട്ടി", Case.ACCUSATIVE), Clitic.UM).surface # 'കുട്ടിയെയും'
synthesize_verb("ഓടുക", VerbForm.PAST).surface # 'ഓടി'
synthesize_verb("കൊടുക്കുക", VerbForm.PAST).surface # 'കൊടുത്തു'
synthesize_verb("ഓടുക", VerbForm.PRESENT_NEGATIVE).surface # 'ഓടുന്നില്ല'
Why this exists
Existing Malayalam morphology tools are either copyleft (Apertium, libindic =
GPL/AGPL) or, in the case of the one permissive generator (mlmorph, MIT), built
on a GPL FST runtime. There is no permissive, dependency-clean, rule-based Malayalam
synthesizer. mlinflect aims to fill that gap with a small pure-Python rule engine
and no copyleft dependencies.
Design
- Declarative, provenance-tagged rules (
mlinflect/rules.py): each rule cites the source it was drawn from and carries averifiedflag that isTrueonly when the form has been ratified by a native reviewer. Adding or correcting a paradigm is a data edit, not a code change. - Inspectable results: every
synthesize_noun(...)returns aSynthResultwith thesurfaceform, themorphemesthat compose it, thestem_class, theprovenancekey, andverified. Feature combinations that are not yet encoded raise rather than return a silently wrong form. - Akshara-correct joins: suffixes are represented matra-initial so concatenation produces correct conjuncts/vowel signs; the genitive uses the canonical nta form (NA + virama + RRA).
Status
Alpha. Eleven ending-conditioned noun classes across 11 cases, covering the major
Malayalam noun shapes, with every encoded form native-ratified (verified=True); shapes
outside the supported classes raise rather than guess. Five classes (am_neuter മരം,
vowel_anuswara കലാം, i_vowel
കുട്ടി/സ്ത്രീ, u_vowel പശു, ṭ_geminate വീട്) are complete in singular and plural;
a_stem (അമ്മ) and the chillu classes (അവൻ, മകൾ, കാർ, കാൽ, തൂൺ) are
singular-complete; their plurals are animacy-conditioned across the full paradigm
(inanimate -കൾ/-ഉകൾ, human -മാർ/-ന്മാർ/-കാർ, animate -കൾ). Suppletive personal
pronouns (ഞാൻ, നീ, അവർ, നാം, താൻ, ഇവൻ) are handled through an exception table rather than
the rule engine. A derive_feminine helper builds a feminine lemma from a masculine base
(എഴുത്തുകാരൻ → എഴുത്തുകാരി) before inflection. Includes differential object marking and a
synthetic/colloquial register for the instrumental. Clitics (-ഉം, -ഓ, -തന്നെ) attach
via with_clitic. Verbs (synthesize_verb) cover the finite forms: present, future, past
(allomorphy by ending plus an irregular lexicon), negation, imperative, and a few moods.
See LIMITATIONS.md for the precise gaps. Postpositions, stylistic
variants, and verb aspects/participles/voice are future work.
Install
pip install mlinflect
# from source:
pip install -e ".[dev]"
License
Apache-2.0. See LICENSE and NOTICE. Contributions are accepted under Apache-2.0
§5 (inbound = outbound); no separate CLA is required.
Linguistic sources are credited in REFERENCES.md.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlinflect-0.1.1.tar.gz.
File metadata
- Download URL: mlinflect-0.1.1.tar.gz
- Upload date:
- Size: 29.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f4aa665d729e3289f3158ea3be74489a8fba501e89952800d9887b4e21d2d19
|
|
| MD5 |
f3679aa3a0b0bde65e4a69a144c4ead0
|
|
| BLAKE2b-256 |
e457dc60604464c5a6eeb1bf8f5408e5007984966b5b3872fee5af5584b9e22b
|
File details
Details for the file mlinflect-0.1.1-py3-none-any.whl.
File metadata
- Download URL: mlinflect-0.1.1-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f41c683609ae036e388f5887075d90c8886b43a9473b1acb4f75203faf416fe3
|
|
| MD5 |
cf4216d2ba701d15a20a8e85455fe179
|
|
| BLAKE2b-256 |
1c7348cf31a4441a929e19afc9011c87838b1a0614ee9f06cdd00463547a1f3c
|