Skip to main content

Permissive, rule-based Malayalam morphological synthesizer (noun inflection generation).

Project description

mlinflect

A permissive, rule-based Malayalam morphological synthesizer. It does forward morphological generation: given a root and grammatical features, it produces the inflected surface form (the counterpart to morphological analysis/segmentation).

from mlinflect import synthesize_noun, Case, Number

synthesize_noun("മരം", Case.LOCATIVE).surface          # 'മരത്തിൽ'
synthesize_noun("മരം", Case.GENITIVE).surface          # 'മരത്തിന്റെ'
synthesize_noun("കുട്ടി", Case.GENITIVE).surface        # 'കുട്ടിയുടെ'
synthesize_noun("മരം", Case.NOMINATIVE, number=Number.PLURAL).surface  # 'മരങ്ങൾ'

Why this exists

Existing Malayalam morphology tools are either copyleft (Apertium, libindic = GPL/AGPL) or, in the case of the one permissive generator (mlmorph, MIT), built on a GPL FST runtime. There is no permissive, dependency-clean, rule-based Malayalam synthesizer. mlinflect aims to fill that gap with a small pure-Python rule engine and no copyleft dependencies.

Design

  • Declarative, provenance-tagged rules (mlinflect/rules.py): each rule cites the source it was drawn from and carries a verified flag that is True only when the form has been ratified by a native reviewer. Adding or correcting a paradigm is a data edit, not a code change.
  • Inspectable results: every synthesize_noun(...) returns a SynthResult with the surface form, the morphemes that compose it, the stem_class, the provenance key, and verified. Feature combinations that are not yet encoded raise rather than return a silently wrong form.
  • Akshara-correct joins: suffixes are represented matra-initial so concatenation produces correct conjuncts/vowel signs; the genitive uses the canonical nta form (NA + virama + RRA).

Status

Alpha. Five ending-conditioned noun classes: am_neuter (മരം) and i_vowel (കുട്ടി) are complete across 11 cases (singular and plural); vowel_anuswara (കലാം), u_vowel (പശു), and ṭ_geminate (വീട്) are partial. Includes differential object marking and a synthetic/colloquial register for the instrumental. Most forms are native-ratified (verified=True); a few are SMC-sourced pending sign-off (verified=False). See LIMITATIONS.md for exactly what is unsupported. Remaining noun classes (a/e-stems, chillu), verbs, and pronouns are future work.

Install

pip install mlinflect        # once published
# from source:
pip install -e ".[dev]"

License

Apache-2.0. See LICENSE and NOTICE. Contributions are accepted under Apache-2.0 §5 (inbound = outbound); no separate CLA is required.

The implemented linguistic rules are facts restated in our own code; no source's text, tables, code, or data is reproduced. Sources are credited in REFERENCES.md as scholarship; that implies no endorsement and creates no license obligation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlinflect-0.0.1.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlinflect-0.0.1-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file mlinflect-0.0.1.tar.gz.

File metadata

  • Download URL: mlinflect-0.0.1.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for mlinflect-0.0.1.tar.gz
Algorithm Hash digest
SHA256 af3852d2b3671837d530a4f635cd95939dd46111b373e6caa9a2a38070677eff
MD5 dd86ef67521752d96a5079ea4551cec0
BLAKE2b-256 34abdc17d3d18381145bb4569c62331468de1bf4c9ce57d6fecba225cf4fbfb6

See more details on using hashes here.

File details

Details for the file mlinflect-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: mlinflect-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for mlinflect-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7b40cfafc799b53efc51b16249ba4e0c65e7273c8213ffb36ab7f2aeeb52e890
MD5 5c92bbebf4d984c0ba6001d99dc52bea
BLAKE2b-256 2087fa4b03e8fe50a549246ecc24624ce1e5ff7ea4f3fbce144d8d4401bce168

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page