Skip to main content

An example based approach at separating suffixes in Malayalam text.

Project description

malayalam_morpheme_splitter

An example based approach at seperating suffixes from Malayalam. Malayalam is rich in morphological variations and is highly agglutinative.

System Description

malayalam_morpheme_splitter is a Python package designed to split suffixes from Malayalam words using an example-based approach. The system comes with a set of malayalam root words and rules(examples) for suffix splitting. But users have the provision to add more root words and rules to improve the system performance if they notice incorrect outputs.

Installation

To install malayalam_morpheme_splitter, you can use pip:

pip install malayalam-morpheme-splitter

Usage

import malayalam_morpheme_splitter as mms

word_list = mms.morph_analysis('കരുതലിൻ്റെ') # ['കരുതൽ', 'ഇൻ്റെ']
word_list1 = mms.morph_analysis('ആനയെ കാണാൻ വനത്തിലേക്ക് പോവുക') # [['ആന', 'എ'], ['കാണാൻ'], ['വനം', 'ഇൽ', 'ഏക്ക്'], ['പോവുക']]

mms.read_all_examples() # returns all the examples in the database

mms.db_entry({'കരുതലിൻ്റെ':['കരുതൽ', 'ഇൻ്റെ']}) # add a new entry to DB

mms.root_word_entry('നികൃഷ്ടം') # add a new root word to DB

Functions

  • morph_analysis(sentence) : This function takes a string as input and returns a list containing segmentations.

Users can control or change the behaviour of the morpheme splitter. If you notice a certain kind of word is not split correctly, or a whord that should not be split is split, those can be fixed by adding data to the system userself:

  • read_all_examples() : Reads all the examples from the DB and returns them as a dictionary. This can be used to examine the current rules.

  • db_entry(inp) : This function takes a dictionary as input and adds it to the DB. Adding a new example will let the system learn that pattern and treat similar words in the way it i split in the given example.

  • root_word_entry(word) : This function take a string as input and adds it to DB. A word which you think shpuld not be split, can be added here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malayalam_morpheme_splitter-1.0.0b1.tar.gz (301.0 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file malayalam_morpheme_splitter-1.0.0b1.tar.gz.

File metadata

File hashes

Hashes for malayalam_morpheme_splitter-1.0.0b1.tar.gz
Algorithm Hash digest
SHA256 cbce1926daeaf378a18fcd997c3789d74275d941aa484c01ef2f72cd6cb3ec7c
MD5 a6c3bdb6f028d6e9089f17c1d4b20e8a
BLAKE2b-256 da7e964e590aa7fbca53de2cd8165ea7e963b375f86f3dfff4d30eb51ad1fb16

See more details on using hashes here.

File details

Details for the file malayalam_morpheme_splitter-1.0.0b1-py3-none-any.whl.

File metadata

File hashes

Hashes for malayalam_morpheme_splitter-1.0.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 314b69f1893ae063b5952fb8968e149a55ca77c0022dc7d7be7e3ad014fdbfba
MD5 62811f18d135fd4fe212d91893c7df0a
BLAKE2b-256 9a8ad3d3f4e84d0dc13a3bc472931abc1dae7e2c8516b6338b9e1f09dd9e02b5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page