Persian Part-of-Speech tagger framework
Project description
Persian Parts-of-Speech tagger
This repository contains Persian Part of Speech tagger based on Conditional Random Fields and a native Text Normalizer.
Table of Contents
TO-DO:
- CRF tagger commit#64
- Wapiti tagger commit#56
- Native Normalizer pull#4
- UnitTesting commit#127
- CI/CD pull#5
- Scrutinize Coverage issue#8
- Documentation pull#9
- Improve Coverage pull#9
- Smooth Installation issue#12 pull#13
- Excel code quality pull#11
- Adding documentation and flowchart of the code.
- CircleCI CI/CD Pipeline Config issue#14
Installation:
Using Pip
! pip install crf_pos
From Source
$ git clone https://github.com/MohammadForouhesh/crf-pos-persian
$ cd crf-pos-persian
$ python setup.py install
On CoLab
! pip install git+https://github.com/MohammadForouhesh/crf-pos-persian.git
Usage
from crf_pos.pos_tagger.wapiti import WapitiPosTagger
pos_tagger = WapitiPosTagger()
tokens = 'او رئیسجمهور حجتالاسلاموالمسلمین ابرهیم رئیسی رئیس جمهور ایران اسلامی می باشد'
pos_tagger[tokens]
[1]:
[('او', 'PRO'),
('رئیس\u200cجمهور', 'N'),
('حجت\u200cالاسلام\u200cوالمسلمین', 'N'),
('ابرهیم', 'N'),
('رئیسی', 'N'),
('رئیس\u200cجمهور', 'N'),
('ایران', 'N'),
('اسلامی', 'ADJ'),
('می\u200cباشد', 'V')]
Implementation Details
Evaluation
Test and training is perfomed on Mojgan Seraji's Uppsala Persian Corpus
Part-of-Speech | Description | precision | recall | f1-score | support |
---|---|---|---|---|---|
N | Noun | 0.985 | 0.970 | 0.977 | 186585 |
P | Preposition | 0.998 | 0.998 | 0.998 | 89450 |
V | Verb | 0.999 | 0.999 | 0.999 | 87762 |
ADV | Adverb | 0.976 | 0.972 | 0.974 | 15983 |
FW | Foreign Word | 0.989 | 0.992 | 0.991 | 2784 |
DET | Determiner | 0.973 | 0.977 | 0.975 | 19786 |
ADJ | Adjective | 0.978 | 0.975 | 0.977 | 61526 |
INT | Interjection | 1.000 | 1.000 | 1.000 | 73 |
CONJ | Conjunction | 0.996 | 0.997 | 0.997 | 74796 |
PRO | Pronoun | 0.973 | 0.974 | 0.973 | 23094 |
NUM | Numeral | 0.988 | 0.992 | 0.990 | 24864 |
avg/total | - | 0.985 | 0.985 | 0.985 | 586703 |
How To Contribute
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
crf_pos-2.2.2.tar.gz
(14.6 kB
view details)
Built Distribution
crf_pos-2.2.2-py3-none-any.whl
(14.6 kB
view details)
File details
Details for the file crf_pos-2.2.2.tar.gz
.
File metadata
- Download URL: crf_pos-2.2.2.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b9eb3b0b01c1dbf58b24557e641c6158387b9bf2e0364e7fd1dc4dbe3c30652 |
|
MD5 | 4cd1c1a3ca32941481d487c92bdeb5dc |
|
BLAKE2b-256 | 64e78507c14d152cbeabebd2faaa33a52290e6652fe37367d472a04014e9c511 |
File details
Details for the file crf_pos-2.2.2-py3-none-any.whl
.
File metadata
- Download URL: crf_pos-2.2.2-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3d53510e1fb63fe8412389dd1d8c29da3f0ec388962559fce7ebea242a91653 |
|
MD5 | 070623da72dc52879a34777031ebefba |
|
BLAKE2b-256 | 7990e26aeb14f4fe4a9f7a0c37e7bc47c70556e379097f9f1da351c38b2aeee5 |