Skip to main content

Rule-based Malayalam morphological synthesizer (noun and verb inflection generation).

Project description

mlinflect

A rule-based Malayalam morphological synthesizer. It does forward morphological generation: given a root and grammatical features, it produces the inflected surface form (the counterpart to morphological analysis/segmentation).

from mlinflect import synthesize_noun, synthesize_verb, with_clitic, Case, Clitic, Number, VerbForm

synthesize_noun("മരം", Case.LOCATIVE).surface          # 'മരത്തിൽ'
synthesize_noun("മരം", Case.GENITIVE).surface          # 'മരത്തിന്റെ'
synthesize_noun("കുട്ടി", Case.GENITIVE).surface        # 'കുട്ടിയുടെ'
synthesize_noun("മരം", Case.NOMINATIVE, number=Number.PLURAL).surface  # 'മരങ്ങൾ'

with_clitic(synthesize_noun("കുട്ടി", Case.ACCUSATIVE), Clitic.UM).surface  # 'കുട്ടിയെയും'

synthesize_verb("ഓടുക", VerbForm.PAST).surface          # 'ഓടി'
synthesize_verb("കൊടുക്കുക", VerbForm.PAST).surface      # 'കൊടുത്തു'
synthesize_verb("ഓടുക", VerbForm.PRESENT_NEGATIVE).surface  # 'ഓടുന്നില്ല'

Why this exists

Existing Malayalam morphology tools are either copyleft (Apertium, libindic = GPL/AGPL) or, in the case of the one permissive generator (mlmorph, MIT), built on a GPL FST runtime. There is no permissive, dependency-clean, rule-based Malayalam synthesizer. mlinflect aims to fill that gap with a small pure-Python rule engine and no copyleft dependencies.

Design

  • Declarative, provenance-tagged rules (mlinflect/rules.py): each rule cites the source it was drawn from and carries a verified flag that is True only when the form has been ratified by a native reviewer. Adding or correcting a paradigm is a data edit, not a code change.
  • Inspectable results: every synthesize_noun(...) returns a SynthResult with the surface form, the morphemes that compose it, the stem_class, the provenance key, and verified. Feature combinations that are not yet encoded raise rather than return a silently wrong form.
  • Akshara-correct joins: suffixes are represented matra-initial so concatenation produces correct conjuncts/vowel signs; the genitive uses the canonical nta form (NA + virama + RRA).

Status

Alpha. Eleven ending-conditioned noun classes across 11 cases, covering the major Malayalam noun shapes, with every encoded form native-ratified (verified=True); shapes outside the supported classes raise rather than guess. Five classes (am_neuter മരം, vowel_anuswara കലാം, i_vowel കുട്ടി/സ്ത്രീ, u_vowel പശു, ṭ_geminate വീട്) are complete in singular and plural; a_stem (അമ്മ) and the chillu classes (അവൻ, മകൾ, കാർ, കാൽ, തൂൺ) are singular-complete; their plurals are animacy-conditioned across the full paradigm (inanimate -കൾ/-ഉകൾ, human -മാർ/-ന്മാർ/-കാർ, animate -കൾ). Suppletive personal pronouns (ഞാൻ, നീ, അവർ, നാം, താൻ, ഇവൻ) are handled through an exception table rather than the rule engine. A derive_feminine helper builds a feminine lemma from a masculine base (എഴുത്തുകാരൻ → എഴുത്തുകാരി) before inflection. Includes differential object marking and a synthetic/colloquial register for the instrumental. Clitics (-ഉം, -ഓ, -തന്നെ) attach via with_clitic. Verbs (synthesize_verb) cover the finite forms: present, future, past (allomorphy by ending plus an irregular lexicon), negation, imperative, and a few moods. See LIMITATIONS.md for the precise gaps. Postpositions, stylistic variants, and verb aspects/participles/voice are future work.

Install

pip install mlinflect
# from source:
pip install -e ".[dev]"

License

Apache-2.0. See LICENSE and NOTICE. Contributions are accepted under Apache-2.0 §5 (inbound = outbound); no separate CLA is required.

Linguistic sources are credited in REFERENCES.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlinflect-0.1.1.tar.gz (29.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlinflect-0.1.1-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file mlinflect-0.1.1.tar.gz.

File metadata

  • Download URL: mlinflect-0.1.1.tar.gz
  • Upload date:
  • Size: 29.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for mlinflect-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3f4aa665d729e3289f3158ea3be74489a8fba501e89952800d9887b4e21d2d19
MD5 f3679aa3a0b0bde65e4a69a144c4ead0
BLAKE2b-256 e457dc60604464c5a6eeb1bf8f5408e5007984966b5b3872fee5af5584b9e22b

See more details on using hashes here.

File details

Details for the file mlinflect-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mlinflect-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for mlinflect-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f41c683609ae036e388f5887075d90c8886b43a9473b1acb4f75203faf416fe3
MD5 cf4216d2ba701d15a20a8e85455fe179
BLAKE2b-256 1c7348cf31a4441a929e19afc9011c87838b1a0614ee9f06cdd00463547a1f3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page