Python Vietnamese Toolkit
Project description
Python Vietnamese Toolkit
Functionality
Tokenize
POS tag
Remove accents
Algorithm: Conditional Random Field
Vietnamese tokenizer f1_score = 0.978637686
Vietnamese pos tagging f1_score = 0.92520656
POS TAGS:
A - Adjective
C - Coordinating conjunction
E - Preposition
I - Interjection
L - Determiner
M - Numeral
N - Common noun
Nc - Noun Classifier
Ny - Noun abbreviation
Np - Proper noun
Nu - Unit noun
P - Pronoun
R - Adverb
S - Subordinating conjunction
T - Auxiliary, modal words
V - Verb
X - Unknown
F - Filtered out (punctuation)
Installation
At the command line with pip
$ pip install pyvi
Uninstall
$ pip uninstall pyvi
Usage
from pyvi import ViTokenizer, ViPosTagger
ViTokenizer.tokenize(u"Trường đại học bách khoa hà nội")
ViPosTagger.postagging(ViTokenizer.tokenize(u"Trường đại học Bách Khoa Hà Nội")
from pyvi import ViUtils
ViUtils.remove_accents(u"Trường đại học bách khoa hà nội")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file pyvi-0.0.9.6-py2.py3-none-any.whl
.
File metadata
- Download URL: pyvi-0.0.9.6-py2.py3-none-any.whl
- Upload date:
- Size: 5.3 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.31.1 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64eafc46ecfa6de234d353af44d7e110796672ef6c06d6d17507179da0835c9f |
|
MD5 | 578933633a98d5baa580aef8dfa044cf |
|
BLAKE2b-256 | e47100402ae910e62cb4e199e8a9173f9b5e81bec961150ad3b4d65b3027ad44 |