Skip to main content

Persian Part-of-Speech tagger framework

Project description

Persian Parts-of-Speech tagger

github-action-deploy Scrutinizer Code Quality Code Coverage Build Status Code Intelligence Status Maintainability Last commit ask

Downloads Downloads_per_month

This repository contains Persian Part of Speech tagger based on Conditional Random Fields and a native Text Normalizer.

Table of Contents

  1. TO-DO
  2. Installation
    1. Using Pip
    2. From Source
    3. On CoLab
  3. Usage
  4. Implementation Details
  5. Evaluation
  6. How To Contribute

TO-DO:

Installation:

Using Pip

! pip install crf_pos

From Source

$ git clone https://github.com/MohammadForouhesh/crf-pos-persian 
$ cd crf-pos-persian
$ python setup.py install

On CoLab

! pip install git+https://github.com/MohammadForouhesh/crf-pos-persian.git

Usage

from crf_pos.pos_tagger.wapiti import WapitiPosTagger
pos_tagger = WapitiPosTagger()
tokens = 'او رئیس‌جمهور حجتالاسلاموالمسلمین ابرهیم رئیسی رئیس جمهور ایران اسلامی می باشد'
pos_tagger[tokens]

[1]: 
[('او', 'PRO'),
('رئیس\u200cجمهور', 'N'),
('حجت\u200cالاسلام\u200cوالمسلمین', 'N'),
('ابرهیم', 'N'),
('رئیسی', 'N'),
('رئیس\u200cجمهور', 'N'),
('ایران', 'N'),
('اسلامی', 'ADJ'),
('می\u200cباشد', 'V')]

Implementation Details

Evaluation

Test and training is perfomed on Mojgan Seraji's Uppsala Persian Corpus

Part-of-Speech Description precision recall f1-score support
N Noun 0.985 0.970 0.977 186585
P Preposition 0.998 0.998 0.998 89450
V Verb 0.999 0.999 0.999 87762
ADV Adverb 0.976 0.972 0.974 15983
FW Foreign Word 0.989 0.992 0.991 2784
DET Determiner 0.973 0.977 0.975 19786
ADJ Adjective 0.978 0.975 0.977 61526
INT Interjection 1.000 1.000 1.000 73
CONJ Conjunction 0.996 0.997 0.997 74796
PRO Pronoun 0.973 0.974 0.973 23094
NUM Numeral 0.988 0.992 0.990 24864
avg/total - 0.985 0.985 0.985 586703

How To Contribute

  1. Report any encountered error trough [BUG]
  2. Report if Normalizer mis-out half-space correction trough [ZWNJ]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crf_pos-2.2.2.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

crf_pos-2.2.2-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file crf_pos-2.2.2.tar.gz.

File metadata

  • Download URL: crf_pos-2.2.2.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for crf_pos-2.2.2.tar.gz
Algorithm Hash digest
SHA256 8b9eb3b0b01c1dbf58b24557e641c6158387b9bf2e0364e7fd1dc4dbe3c30652
MD5 4cd1c1a3ca32941481d487c92bdeb5dc
BLAKE2b-256 64e78507c14d152cbeabebd2faaa33a52290e6652fe37367d472a04014e9c511

See more details on using hashes here.

File details

Details for the file crf_pos-2.2.2-py3-none-any.whl.

File metadata

  • Download URL: crf_pos-2.2.2-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for crf_pos-2.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c3d53510e1fb63fe8412389dd1d8c29da3f0ec388962559fce7ebea242a91653
MD5 070623da72dc52879a34777031ebefba
BLAKE2b-256 7990e26aeb14f4fe4a9f7a0c37e7bc47c70556e379097f9f1da351c38b2aeee5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page