Skip to main content

An example based approach at separating suffixes in Malayalam text.

Project description

malayalam_morpheme_splitter

An example based approach at seperating suffixes from Malayalam. Malayalam is rich in morphological variations and is highly agglutinative.

System Description

malayalam_morpheme_splitter is a Python package designed to split suffixes from Malayalam words using an example-based approach. The system comes with a set of malayalam root words and rules(examples) for suffix splitting. But users have the provision to add more root words and rules to improve the system performance if they notice incorrect outputs.

Installation

To install malayalam_morpheme_splitter, you can use pip:

pip install malayalam-morpheme-splitter

Usage

import malayalam_morpheme_splitter as mms

word_list = mms.morph_analysis('കരുതലിൻ്റെ') # ['കരുതൽ', 'ഇൻ്റെ']
word_list1 = mms.morph_analysis('ആനയെ കാണാൻ വനത്തിലേക്ക് പോവുക') # [['ആന', 'എ'], ['കാണാൻ'], ['വനം', 'ഇൽ', 'ഏക്ക്'], ['പോവുക']]

mms.read_all_examples() # returns all the examples in the database

mms.db_entry({'കരുതലിൻ്റെ':['കരുതൽ', 'ഇൻ്റെ']}) # add a new entry to DB

mms.root_word_entry('നികൃഷ്ടം') # add a new root word to DB

Functions

  • morph_analysis(sentence) : This function takes a string as input and returns a list containing segmentations.

Users can control or change the behaviour of the morpheme splitter. If you notice a certain kind of word is not split correctly, or a whord that should not be split is split, those can be fixed by adding data to the system userself:

  • read_all_examples() : Reads all the examples from the DB and returns them as a dictionary. This can be used to examine the current rules.

  • db_entry(inp) : This function takes a dictionary as input and adds it to the DB. Adding a new example will let the system learn that pattern and treat similar words in the way it i split in the given example.

  • root_word_entry(word) : This function take a string as input and adds it to DB. A word which you think shpuld not be split, can be added here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malayalam_morpheme_splitter-1.0.0b2.tar.gz (300.9 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file malayalam_morpheme_splitter-1.0.0b2.tar.gz.

File metadata

File hashes

Hashes for malayalam_morpheme_splitter-1.0.0b2.tar.gz
Algorithm Hash digest
SHA256 5453a102d8615a5e407af2444c747fc2504a840773b2ed8607f733890fe6c5f6
MD5 0853429e44c5cac6cc23f8060d4b3b8e
BLAKE2b-256 6f90a3468ab9e0b96d2eb4749d5c7f2ca05504264dce3cb4773f995ed1a6835b

See more details on using hashes here.

File details

Details for the file malayalam_morpheme_splitter-1.0.0b2-py3-none-any.whl.

File metadata

File hashes

Hashes for malayalam_morpheme_splitter-1.0.0b2-py3-none-any.whl
Algorithm Hash digest
SHA256 2b2034b94945ac0c496af2d660b59b50221cab71a8fd64a5b5bb833120aaa3a6
MD5 df1a183d00af058eaf216ed41436ded5
BLAKE2b-256 748c7a58d97c7a317484acd09f8ca2d091a97501c8610e107fe4054d6ff020b0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page