Python Vietnamese Toolkit
Project description
Python Vietnamese Toolkit
Functionality
Tokenize
POS tag
Remove accents
Add accents
Algorithm: Conditional Random Field
Vietnamese tokenizer f1_score = 0.985
Vietnamese pos tagging f1_score = 0.925
POS TAGS:
A - Adjective
C - Coordinating conjunction
E - Preposition
I - Interjection
L - Determiner
M - Numeral
N - Common noun
Nc - Noun Classifier
Ny - Noun abbreviation
Np - Proper noun
Nu - Unit noun
P - Pronoun
R - Adverb
S - Subordinating conjunction
T - Auxiliary, modal words
V - Verb
X - Unknown
F - Filtered out (punctuation)
Installation
At the command line with pip
$ pip install pyvi
Uninstall
$ pip uninstall pyvi
Usage
from pyvi import ViTokenizer, ViPosTagger
ViTokenizer.tokenize(u"Trường đại học bách khoa hà nội")
ViPosTagger.postagging(ViTokenizer.tokenize(u"Trường đại học Bách Khoa Hà Nội")
from pyvi import ViUtils
ViUtils.remove_accents(u"Trường đại học bách khoa hà nội")
from pyvi import ViUtils
ViUtils.add_accents(u'truong dai hoc bach khoa ha noi')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyvi-0.1.tar.gz
(8.4 MB
view details)
Built Distribution
File details
Details for the file pyvi-0.1.tar.gz
.
File metadata
- Download URL: pyvi-0.1.tar.gz
- Upload date:
- Size: 8.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.46.0 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8af01a367fb83105cbe25ea61f15c511127a184f2f66cbcfae752c8706789a2f |
|
MD5 | d28f0ace09dead63290ea98223933834 |
|
BLAKE2b-256 | 2cb12285d8f9292e9d23b79016e8e4ef0539b01a1b73ae24c1eba7dd3a5a821e |
File details
Details for the file pyvi-0.1-py2.py3-none-any.whl
.
File metadata
- Download URL: pyvi-0.1-py2.py3-none-any.whl
- Upload date:
- Size: 8.5 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.46.0 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c640f9be9746ea2bf6bd6e7d307ddc7d1410f2f80ea596f06062b7925e102a4a |
|
MD5 | cb0c13ee785d612d4573d94f89973f37 |
|
BLAKE2b-256 | 10e10e5bc6b5e3327b9385d6e0f1b0a7c0404f28b74eb6db59a778515b30fd9c |