Persian Part-of-Speech tagger framework
Project description
Persian Parts-of-Speech tagger
This repository contains Persian Part of Speech tagger based on Conditional Random Fields and a native Text Normalizer.
Table of Contents
TO-DO:
- CRF tagger commit#64
- Wapiti tagger commit#56
- Native Normalizer pull#4
- UnitTesting commit#127
- CI/CD pull#5
- Scrutinize Coverage issue#8
- Documentation pull#9
- Improve Coverage pull#9
- Smooth Installation
- Excel code quality pull#11
Installation:
$ git clone https://github.com/MohammadForouhesh/crf-pos-persian
$ cd crf-pos-persian
$ python setup.py install
on CoLab
! pip install git+https://github.com/MohammadForouhesh/crf-pos-persian.git
Usage
from crf_pos.pos_tagger.wapiti import WapitiPosTagger
pos_tagger = WapitiPosTagger()
tokens = text = 'او رئیسجمهور حجتالاسلاموالمسلمین ابرهیم رئیسی رئیس جمهور می باشد'.split()
pos_tagger[tokens]
[1]:
[('ابراهیم', 'N'),
('رپیسی', 'N'),
('ریپس', 'ADJ'),
('جمهور', 'N'),
('جمهوری', 'N'),
('اسلامی', 'ADJ'),
('ایران', 'N'),
('میباشد', 'V')]
Evaluation
Part-of-Speech | precision | recall | f1-score | support |
---|---|---|---|---|
N | 0.985 | 0.970 | 0.977 | 186585 |
P | 0.998 | 0.998 | 0.998 | 89450 |
V | 0.999 | 0.999 | 0.999 | 87762 |
ADV | 0.976 | 0.972 | 0.974 | 15983 |
ADVe | 0.988 | 0.978 | 0.983 | 1053 |
RES | 0.989 | 0.992 | 0.991 | 2784 |
RESe | 1.000 | 0.989 | 0.994 | 174 |
DET | 0.973 | 0.977 | 0.975 | 19786 |
DETe | 0.960 | 0.970 | 0.965 | 2156 |
AJ | 0.978 | 0.975 | 0.977 | 61526 |
AJe | 0.949 | 0.964 | 0.957 | 19919 |
CL | 0.932 | 0.918 | 0.925 | 1892 |
INT | 1.000 | 1.000 | 1.000 | 73 |
CONJ | 0.996 | 0.997 | 0.997 | 74796 |
CONJe | 1.000 | 1.000 | 1.000 | 82 |
POSTP | 1.000 | 1.000 | 1.000 | 13174 |
PRO | 0.973 | 0.974 | 0.973 | 23094 |
PROe | 0.878 | 0.579 | 0.698 | 273 |
NUM | 0.988 | 0.992 | 0.990 | 24864 |
NUMe | 0.932 | 0.918 | 0.925 | 2519 |
PUNC | 1.000 | 1.000 | 1.000 | 84088 |
Ne | 0.970 | 0.985 | 0.977 | 163760 |
Pe | 0.986 | 0.992 | 0.989 | 10004 |
avg/total | 0.985 | 0.985 | 0.985 | 885797 |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
crf_pos-2.0.0.tar.gz
(11.9 kB
view hashes)
Built Distribution
crf_pos-2.0.0-py3-none-any.whl
(12.7 kB
view hashes)