A Python Wrapper for VnCoreNLP
Project description
Table of contents
py_vncorenlp: A Python Wrapper for VnCoreNLP
Prerequisites
Installation
-
To install this python wrapper for VnCoreNLP, users have to run the following command:
$ pip install py_vncorenlp
Example usage
import py_vncorenlp
# Automatically download the VnCoreNLP model from the original resitory
py_vncorenlp.download_model(save_dir='./')
# Load the pretrained VnCoreNLP model
model = py_vncorenlp.VnCoreNLP(annotators=["wseg", "pos", "ner", "parse"], save_dir='./')
# Annotate a corpus where each line represents a raw sentence
model.annotate_file(input_file="input.txt", output_file="output.txt")
# Annotate a raw sentence
model.print_out(model.annotate_sentence("Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội."))
By default, the output for each input sentence is formatted with 6 columns representing word index, word form, POS tag, NER label, head index of the current word and its dependency relation type:
1 Ông Nc O 4 sub
2 Nguyễn_Khắc_Chúc Np B-PER 1 nmod
3 đang R O 4 adv
4 làm_việc V O 0 root
5 tại E O 4 loc
6 Đại_học N B-ORG 5 pob
7 Quốc_gia N I-ORG 6 nmod
8 Hà_Nội Np I-ORG 6 nmod
9 . CH O 4 punct
In addition, to be convenient for users who use only the VnCoreNLP for the word segmentation, we also provide a function only for this:
model = py_vncorenlp.VnCoreNLP(annotators=["wseg"], save_dir='./')
sentence = "Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội."
output = model.word_segment(sentence)
print(output)
# The result: "Ông Nguyễn_Khắc_Chúc đang làm_việc tại Đại_học Quốc_gia Hà_Nội ."
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
py_vncorenlp-0.0.9.tar.gz
(3.7 kB
view details)
File details
Details for the file py_vncorenlp-0.0.9.tar.gz
.
File metadata
- Download URL: py_vncorenlp-0.0.9.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.28.0 setuptools/47.1.1.post20200604 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab2f3f703d91c89819fdd9b7be0dc695d4a11b339db1bc9925389bad2181eb97 |
|
MD5 | f4f64fdcce23c8ba7ce0cfe37a7d0dca |
|
BLAKE2b-256 | bb96d938858712d04df1f15d279ca3618010ab51d60a89684bdada7e1f2eb145 |