Skip to main content

Persian Part-of-Speech tagger framework

Project description

Persian Parts-of-Speech tagger

github-action-deploy Scrutinizer Code Quality Code Coverage Build Status Code Intelligence Status Maintainability Last commit ask

Downloads Downloads_per_month

This repository contains Persian Part of Speech tagger based on Conditional Random Fields and a native Text Normalizer.

Table of Contents

  1. TO-DO
  2. Installation
    1. Using Pip
    2. From Source
    3. On CoLab
  3. Usage
  4. Implementation Details
  5. Evaluation
  6. How To Contribute

TO-DO:

Installation:

Using Pip

! pip install crf_pos

From Source

$ git clone https://github.com/MohammadForouhesh/crf-pos-persian 
$ cd crf-pos-persian
$ python setup.py install

On CoLab

! pip install git+https://github.com/MohammadForouhesh/crf-pos-persian.git

Usage

from crf_pos.pos_tagger.wapiti import WapitiPosTagger
pos_tagger = WapitiPosTagger()
tokens = 'او رئیس‌جمهور حجتالاسلاموالمسلمین ابرهیم رئیسی رئیس جمهور ایران اسلامی می باشد'
pos_tagger[tokens]

[1]: 
[('او', 'PRO'),
('رئیس\u200cجمهور', 'N'),
('حجت\u200cالاسلام\u200cوالمسلمین', 'N'),
('ابرهیم', 'N'),
('رئیسی', 'N'),
('رئیس\u200cجمهور', 'N'),
('ایران', 'N'),
('اسلامی', 'ADJ'),
('می\u200cباشد', 'V')]

Implementation Details

Evaluation

Test and training is perfomed on Mojgan Seraji's Uppsala Persian Corpus

Part-of-Speech Description precision recall f1-score support
N Noun 0.985 0.970 0.977 186585
P Preposition 0.998 0.998 0.998 89450
V Verb 0.999 0.999 0.999 87762
ADV Adverb 0.976 0.972 0.974 15983
FW Foreign Word 0.989 0.992 0.991 2784
DET Determiner 0.973 0.977 0.975 19786
ADJ Adjective 0.978 0.975 0.977 61526
INT Interjection 1.000 1.000 1.000 73
CONJ Conjunction 0.996 0.997 0.997 74796
PRO Pronoun 0.973 0.974 0.973 23094
NUM Numeral 0.988 0.992 0.990 24864
avg/total - 0.985 0.985 0.985 586703

How To Contribute

  1. Report any encountered error trough [BUG]
  2. Report if Normalizer mis-out half-space correction trough [ZWNJ]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crf_pos-2.2.2.tar.gz (14.6 kB view hashes)

Uploaded Source

Built Distribution

crf_pos-2.2.2-py3-none-any.whl (14.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page