Python Vietnamese Toolkit
Project description
Python Vietnamese Toolkit
Functionality
Tokenize
POS tag
Remove accents
Add accents
Algorithm: Conditional Random Field
Vietnamese tokenizer f1_score = 0.978637686
Vietnamese pos tagging f1_score = 0.92520656
POS TAGS:
A - Adjective
C - Coordinating conjunction
E - Preposition
I - Interjection
L - Determiner
M - Numeral
N - Common noun
Nc - Noun Classifier
Ny - Noun abbreviation
Np - Proper noun
Nu - Unit noun
P - Pronoun
R - Adverb
S - Subordinating conjunction
T - Auxiliary, modal words
V - Verb
X - Unknown
F - Filtered out (punctuation)
Installation
At the command line with pip
$ pip install pyvi
Uninstall
$ pip uninstall pyvi
Usage
from pyvi import ViTokenizer, ViPosTagger
ViTokenizer.tokenize(u"Trường đại học bách khoa hà nội")
ViPosTagger.postagging(ViTokenizer.tokenize(u"Trường đại học Bách Khoa Hà Nội")
from pyvi import ViUtils
ViUtils.remove_accents(u"Trường đại học bách khoa hà nội")
from pyvi import ViUtils
ViUtils.add_accents(u'truong dai hoc bach khoa ha noi')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyvi-0.0.9.9.tar.gz
(8.7 MB
view hashes)
Built Distribution
Close
Hashes for pyvi-0.0.9.9-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ba78d34e70f8ed38ae6aaa2f2b4b80afd087a202790232191a9d11636414c73 |
|
MD5 | f0137ebc74adcf21b9180d31a3587f8c |
|
BLAKE2b-256 | 64a1d2554092d4b49642107335db14d20d24651009e1d97ac865fc3fb4f86c43 |