Skip to main content

Python Vietnamese Toolkit

Project description

Pyvi performs tokenizing / pos-tagging for Vietnamese in Python.

Algorithm: Conditional Random Field

Vietnamese tokenizer f1_score = 0.978637686

Vietnamese pos tagging f1_score = 0.92520656

POS TAGS:

  • A - Adjective

  • C - Coordinating conjunction

  • E - Preposition

  • I - Interjection

  • L - Determiner

  • M - Numeral

  • N - Common noun

  • Nc - Noun Classifier

  • Ny - Noun abbreviation

  • Np - Proper noun

  • Nu - Unit noun

  • P - Pronoun

  • R - Adverb

  • S - Subordinating conjunction

  • T - Auxiliary, modal words

  • V - Verb

  • X - Unknown

  • F - Filtered out (punctuation)

Installation

At the command line with pip

$ pip install pyvi

Uninstall

$ pip uninstall pyvi

Usage

from pyvi import ViTokenizer, ViPosTagger

ViTokenizer.tokenize(u"Trường đại học bách khoa hà nội")

ViPosTagger.postagging(ViTokenizer.tokenize(u"Trường đại học Bách Khoa Hà Nội")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyvi-0.0.9.2.tar.gz (5.2 MB view details)

Uploaded Source

File details

Details for the file pyvi-0.0.9.2.tar.gz.

File metadata

  • Download URL: pyvi-0.0.9.2.tar.gz
  • Upload date:
  • Size: 5.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.19.9 CPython/3.6.3

File hashes

Hashes for pyvi-0.0.9.2.tar.gz
Algorithm Hash digest
SHA256 db56ed20f39fdf820bad5a6fa7159620db9eef5d9b1a532de106f744b0d4653e
MD5 4a87cd4b5aad6952651e37e6c3966749
BLAKE2b-256 4991ec00d4034ec22d65dfdfa807918d7ebb9e7e3fe6c26551d1d5df7ac4e410

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page