A Python Wrapper for VnCoreNLP
Project description
Table of contents
py_vncorenlp: A Python Wrapper for VnCoreNLP
Prerequisites
Installation
-
To install this python wrapper for VnCoreNLP, users have to run the following command:
$ pip install py_vncorenlp
Example usage
import py_vncorenlp
# Automatically download the VnCoreNLP model from the original resitory
py_vncorenlp.download_model(save_dir='./')
# Load the pretrained VnCoreNLP model
model = py_vncorenlp.VnCoreNLP(annotators=["wseg", "pos", "ner", "parse"], save_dir='./')
# Annotate a corpus where each line represents a raw sentence
model.annotate_file(input_file="input.txt", output_file="output.txt")
# Annotate a raw sentence
model.print_out(model.annotate_sentence("Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội."))
By default, the output for each input sentence is formatted with 6 columns representing word index, word form, POS tag, NER label, head index of the current word and its dependency relation type:
1 Ông Nc O 4 sub
2 Nguyễn_Khắc_Chúc Np B-PER 1 nmod
3 đang R O 4 adv
4 làm_việc V O 0 root
5 tại E O 4 loc
6 Đại_học N B-ORG 5 pob
7 Quốc_gia N I-ORG 6 nmod
8 Hà_Nội Np I-ORG 6 nmod
9 . CH O 4 punct
In addition, to be convenient for users who use only the VnCoreNLP for the word segmentation, we also provide a function only for this:
model = py_vncorenlp.VnCoreNLP(annotators=["wseg"], save_dir='./')
sentence = "Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội."
output = model.word_segment(sentence)
print(output)
# The result: "Ông Nguyễn_Khắc_Chúc đang làm_việc tại Đại_học Quốc_gia Hà_Nội ."
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
py_vncorenlp-0.1.1.tar.gz
(3.8 kB
view details)
File details
Details for the file py_vncorenlp-0.1.1.tar.gz
.
File metadata
- Download URL: py_vncorenlp-0.1.1.tar.gz
- Upload date:
- Size: 3.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.28.0 setuptools/47.1.1.post20200604 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01b82b14204f1b39df320bdf44287725843bcd7fe45edd5a2df064a1f938e3f7 |
|
MD5 | 25c944e161a95e3ab397d3724f1ceb3c |
|
BLAKE2b-256 | 73042032cc4775af0c5827f0356f4383ab4a549924d3102a3fb30881df14a356 |