Grapheme Parser for indic languages
Project description
indicparser
Grapheme Parser for indic languages
Installaton
pip install indicparser
Useage
- initializing the parser
from indicparser import graphemeParser
gp=graphemeParser("bangla")
- extracting graphemes
text=" শাটিকাপ মার"
graphemes=gp.process(text)
print("Graphemes:",graphemes)
Graphemes: [' ', ' ', 'শা', 'টি', 'কা', 'প', ' ', ' ', ' ', 'মা', 'র']
- extracting grapheme root, consonant diacritics and vowel diacritics
comps=gp.process(text,return_graphemes=False)
print("Components:",comps)
Components: [' ', ' ', 'শ', 'া', 'ট', 'ি', 'ক', 'া', 'প', ' ', ' ', ' ', 'ম', 'া', 'র']
- extracting graphemes but merging spaces and clearing initial and ending space
graphemes=gp.process(text,merge_spaces=True)
print("Graphemes (space corrected):",graphemes)
Graphemes (space corrected): ['শা', 'টি', 'কা', 'প', ' ', 'মা', 'র']
- treatment of numbers and puntucation and english is also available by default
text="এটাকি 2441139 ? না ভাই wrong number"
graphemes=gp.process(text,merge_spaces=True)
print("Graphemes:",graphemes)
Graphemes: ['এ', 'টা', 'কি', ' ', '2', '4', '4', '1', '1', '3', '9', ' ', '?', ' ', 'না', ' ', 'ভা', 'ই', ' ', 'w', 'r', 'o', 'n', 'g', ' ', 'n', 'u', 'm', 'b', 'e', 'r']
- available languages
from indicparser import languages
languages.keys()
dict_keys(['bangla', 'malyalam', 'tamil', 'gujrati', 'panjabi', 'odiya', 'hindi'])
ABOUT
-
Authors: Bengali.AI
-
Cite Bengali.AI multipurpose grapheme dataset paper
@inproceedings{alam2021large,
title={A large multi-target dataset of common bengali handwritten graphemes},
author={Alam, Samiul and Reasat, Tahsin and Sushmit, Asif Shahriyar and Siddique, Sadi Mohammad and Rahman, Fuad and Hasan, Mahady and Humayun, Ahmed Imtiaz},
booktitle={International Conference on Document Analysis and Recognition},
pages={383--398},
year={2021},
organization={Springer}
}
Change Log
0.0.1 (12/02/2022)
- First Release
0.0.2 (12/02/2022)
- Basic Documentation
- Modifier removal
- space correction
- text mode parser
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
indicparser-0.0.2.tar.gz
(5.6 kB
view hashes)