A Python Wrapper for VnCoreNLP
Project description
Table of contents
py_vncorenlp: A Python Wrapper for VnCoreNLP
Prerequisites
Installation
-
To install this python wrapper for VnCoreNLP, users have to run the following command:
$ pip install py_vncorenlp
Example usage
import py_vncorenlp
# Automatically download the VnCoreNLP model from the original resitory
py_vncorenlp.download_model()
# Load the pretrained VnCoreNLP model
model = py_vncorenlp.VnCoreNLP(annotators=["wseg", "pos", "ner", "parse"])
# Annotate a corpus where each line represents a raw sentence
model.annotate_file(input_file="input.txt", output_file="output.txt")
# Annotate a raw sentence
model.print_out(model.annotate_sentence("Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội.")
By default, the output for each input sentence is formatted with 6 columns representing word index, word form, POS tag, NER label, head index of the current word and its dependency relation type:
1 Ông Nc O 4 sub
2 Nguyễn_Khắc_Chúc Np B-PER 1 nmod
3 đang R O 4 adv
4 làm_việc V O 0 root
5 tại E O 4 loc
6 Đại_học N B-ORG 5 pob
7 Quốc_gia N I-ORG 6 nmod
8 Hà_Nội Np I-ORG 6 nmod
9 . CH O 4 punct
In addition, to be convenient for users who use only the VnCoreNLP for the word segmentation, we also provide a function only for this:
sentence = "Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội."
output = model.tokenize(sentence)
print(output)
# The result: "Ông Nguyễn_Khắc_Chúc đang làm_việc tại Đại_học Quốc_gia Hà_Nội ."
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
py_vncorenlp-0.0.7.tar.gz
(3.6 kB
view details)
File details
Details for the file py_vncorenlp-0.0.7.tar.gz
.
File metadata
- Download URL: py_vncorenlp-0.0.7.tar.gz
- Upload date:
- Size: 3.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/47.1.1.post20200604 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 343ebca9013d04fa875abd653cf2785d831deaed1ccd1be2c81a73e41b7f4f66 |
|
MD5 | 14da36e915030cbcd36c2f3cf092edbc |
|
BLAKE2b-256 | ce5d3c2aa17e74aa694973fcc06022313c9ef898ca31972f5bc01a35ac4bbf89 |