An example based approach at separating suffixes in Malayalam text.
Project description
malayalam_morpheme_splitter
An example based approach at seperating suffixes from Malayalam. Malayalam is rich in morphological variations and is highly agglutinative.
System Description
malayalam_morpheme_splitter is a Python package designed to split suffixes from Malayalam words using an example-based approach. The system comes with a set of malayalam root words and rules(examples) for suffix splitting. But users have the provision to add more root words and rules to improve the system performance if they notice incorrect outputs.
Installation
To install malayalam_morpheme_splitter, you can use pip:
pip install malayalam-morpheme-splitter
Usage
import malayalam_morpheme_splitter as mms
word_list = mms.morph_analysis('കരുതലിൻ്റെ') # ['കരുതൽ', 'ഇൻ്റെ']
word_list1 = mms.morph_analysis('ആനയെ കാണാൻ വനത്തിലേക്ക് പോവുക') # [['ആന', 'എ'], ['കാണാൻ'], ['വനം', 'ഇൽ', 'ഏക്ക്'], ['പോവുക']]
mms.read_all_examples() # returns all the examples in the database
mms.db_entry({'കരുതലിൻ്റെ':['കരുതൽ', 'ഇൻ്റെ']}) # add a new entry to DB
mms.root_word_entry('നികൃഷ്ടം') # add a new root word to DB
Functions
- morph_analysis(sentence) : This function takes a string as input and returns a list containing segmentations.
Users can control or change the behaviour of the morpheme splitter. If you notice a certain kind of word is not split correctly, or a whord that should not be split is split, those can be fixed by adding data to the system userself:
-
read_all_examples() : Reads all the examples from the DB and returns them as a dictionary. This can be used to examine the current rules.
-
db_entry(inp) : This function takes a dictionary as input and adds it to the DB. Adding a new example will let the system learn that pattern and treat similar words in the way it i split in the given example.
-
root_word_entry(word) : This function take a string as input and adds it to DB. A word which you think shpuld not be split, can be added here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file malayalam_morpheme_splitter-1.0.0b1.tar.gz
.
File metadata
- Download URL: malayalam_morpheme_splitter-1.0.0b1.tar.gz
- Upload date:
- Size: 301.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cbce1926daeaf378a18fcd997c3789d74275d941aa484c01ef2f72cd6cb3ec7c |
|
MD5 | a6c3bdb6f028d6e9089f17c1d4b20e8a |
|
BLAKE2b-256 | da7e964e590aa7fbca53de2cd8165ea7e963b375f86f3dfff4d30eb51ad1fb16 |
File details
Details for the file malayalam_morpheme_splitter-1.0.0b1-py3-none-any.whl
.
File metadata
- Download URL: malayalam_morpheme_splitter-1.0.0b1-py3-none-any.whl
- Upload date:
- Size: 305.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 314b69f1893ae063b5952fb8968e149a55ca77c0022dc7d7be7e3ad014fdbfba |
|
MD5 | 62811f18d135fd4fe212d91893c7df0a |
|
BLAKE2b-256 | 9a8ad3d3f4e84d0dc13a3bc472931abc1dae7e2c8516b6338b9e1f09dd9e02b5 |