Python Vietnamese Toolkit
Project description
Python Vietnamese Toolkit
What’s New (0.1)
Retrain a new tokenization model on a much bigger dataset. F1 score =0.985
Add training data and training code
Better integration to spacy.io (removing redundant spaces between tokens after tokenization. Eg. Việt Nam , 12 / 22 / 2020 => Việt Nam, 12/22/2020]
Functionality
Tokenization
POS tagging
Accents removal
Accents adding
Algorithm: Conditional Random Field
Vietnamese tokenizer f1_score = 0.985
Vietnamese pos tagging f1_score = 0.925
POS TAGS:
A - Adjective
C - Coordinating conjunction
E - Preposition
I - Interjection
L - Determiner
M - Numeral
N - Common noun
Nc - Noun Classifier
Ny - Noun abbreviation
Np - Proper noun
Nu - Unit noun
P - Pronoun
R - Adverb
S - Subordinating conjunction
T - Auxiliary, modal words
V - Verb
X - Unknown
F - Filtered out (punctuation)
Installation
At the command line with pip
$ pip install pyvi
Uninstall
$ pip uninstall pyvi
Usage
from pyvi import ViTokenizer, ViPosTagger
ViTokenizer.tokenize(u"Trường đại học bách khoa hà nội")
ViPosTagger.postagging(ViTokenizer.tokenize(u"Trường đại học Bách Khoa Hà Nội")
from pyvi import ViUtils
ViUtils.remove_accents(u"Trường đại học bách khoa hà nội")
from pyvi import ViUtils
ViUtils.add_accents(u'truong dai hoc bach khoa ha noi')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyvi-0.1.1.tar.gz
.
File metadata
- Download URL: pyvi-0.1.1.tar.gz
- Upload date:
- Size: 8.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08abab45a83ad674cf6549725096b8530d49be51418aa2613ce1a0b45d17db2d |
|
MD5 | 2f83251c7f5da98598a9dea35bf67234 |
|
BLAKE2b-256 | dfafce6ccb5458a7fed50127de2cac4ee47f88283062c859279e50de1e7cb7a9 |
File details
Details for the file pyvi-0.1.1-py2.py3-none-any.whl
.
File metadata
- Download URL: pyvi-0.1.1-py2.py3-none-any.whl
- Upload date:
- Size: 8.5 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5e9e0e40ea1a556af12646c6b754f065342c86f1dfe84785ba926e1190c0bb7 |
|
MD5 | f18551ea06969a79686e736dd6a28ff1 |
|
BLAKE2b-256 | 2c2727ffee2663f42430cf3434da963f04224fec157b90799fe9e92a3564c1a6 |