Named Entity Recognition Toolkit
Project description
Named Entity Recognition Toolkit
Provide a toolkit for rapidly extracting useful entities from text using various Python packages, including Stanza.
Features
We try to bring the complicated use of existing NLP toolkits down to earth by keeping APIs as simple as possible with best practice.
Installation
pip install ner-kit
Examples
Example 1: Word segmention
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
sw.download(lang="en")
text='This is a test sentence for stanza. This is another sentence.'
result1=sw.tokenize(text)
sw.print_result(result1)
Example 2: Chinese word segmentation
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
sw.download(lang="zh")
text='我在北京吃苹果!'
result1=sw.tokenize(text,lang='zh')
sw.print_result(result1)
Example 3: Multi-Word Token (MWT) Expansion
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
sw.download(lang="fr")
text='Nous avons atteint la fin du sentier.'
result1=sw.mwt_expand(text,lang='fr')
sw.print_result(result1)
Example 4: POS tagging
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
sw.download(lang='en')
text='I like apple'
result1=sw.tag(text)
sw.print_result(result1)
sw.download_chinese_model()
text='我喜欢苹果'
result2=sw.tag_chinese(text,lang='zh')
sw.print_result(result2)
Example 5: Named Entity Recognition
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
sw.download(lang='en')
sw.download_chinese_model()
text_en = 'I like Beijing!'
result1 = sw.ner(text_en)
sw.print_result(result1)
text='我喜欢北京!'
result2=sw.ner_chinese(text)
sw.print_result(result2)
Example 6: Sentiment Analysis
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
text_en = 'I like Beijing!'
result1 = sw.sentiment(text_en)
sw.print_result(result1)
text_zh='我讨厌苹果!'
result2=sw.sentiment_chinese(text_zh)
sw.print_result(result2)
Example 7: Language detection from text
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
list_text = ['I like Beijing!','我喜欢北京!', "Bonjour le monde!"]
result1 = sw.lang(list_text)
sw.print_result(result1)
Example 8: Language detection from text with a user-defined processing function
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
list_text = ['I like Beijing!','我喜欢北京!', "Bonjour le monde!"]
def process(model):# do your own business
doc=model["doc"]
print(f"{doc.sentences[0].dependencies_string()}")
result1 = sw.lang_multi(list_text,func_process=process,download_lang='en,zh,fr')
print(result1)
sw.print_result(result1)
Example 9: Stanza's NER (Legacy use for Java-based Stanford CoreNLP)
from nerkit.StanzaApi import StanfordCoreNLPClient
corenlp_root_path=f"stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2"
corenlp=StanfordCoreNLPClient(corenlp_root_path=corenlp_root_path,language='zh')
text="我喜欢游览广东孙中山故居景点!"
list_token=corenlp.get_entity_list(text)
for token in list_token:
print(f"{token['value']}\t{token['pos']}\t{token['ner']}")
Example 10: Stanford CoreNLP (Not official version)
import os
from nerkit.StanfordCoreNLP import get_entity_list
text="我喜欢游览广东孙中山故居景点!"
current_path = os.path.dirname(os.path.realpath(__file__))
res=get_entity_list(text,resource_path=f"{current_path}/stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2")
print(res)
for w,tag in res:
if tag in ['PERSON','ORGANIZATION','LOCATION']:
print(w,tag)
Example 11: Open IE
from nerkit.StanzaApi import StanfordCoreNLPClient
corenlp_root_path=f"stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2"
text = "Barack Obama was born in Hawaii. Richard Manning wrote this sentence."
corenlp=StanfordCoreNLPClient(corenlp_root_path=corenlp_root_path)
list_result=corenlp.open_ie(text)
for model in list_result:
print(model["subject"],'--',model['relation'],'-->',model["object"])
out=corenlp.force_close_server() # support force closing port in Windows
print(out)
Example 12: Generate triples from files
from nerkit.triples.text import generate_triples_from_files
input_folder='data'
output_folder='output'
list_result_all=generate_triples_from_files(input_folder=input_folder,
output_folder=output_folder,
return_all_results=True,
ltp_data_folder='../ltp_data')
print(list_result_all)
Example 13: Generate a list of tripls
from nerkit.triples.ltp import *
text=open("data/test.txt",'r',encoding='utf-8').read()
extractor=get_ltp_triple_instance(ltp_data_folder='D:/UIBEResearch/ltp_data')
list_event=get_ltp_triple_list(extractor=extractor,text=text)
for event in list_event:
print(event)
Credits & References
License
The ner-kit
project is provided by Donghua Chen.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ner-kit-0.0.5.tar.gz
.
File metadata
- Download URL: ner-kit-0.0.5.tar.gz
- Upload date:
- Size: 44.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f8ca98d18c273900e626cd9526a33bc114cdeba832f60281bcc7c754de1ca71 |
|
MD5 | aa1185c784ac7bac5f1003a759038d97 |
|
BLAKE2b-256 | 4aebdfee7470e28fadb26e6a3badc2f72effb7979ff37aa23418ef00ddcb8858 |
File details
Details for the file ner_kit-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: ner_kit-0.0.5-py3-none-any.whl
- Upload date:
- Size: 47.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d72c2a726def4d8956b7abb2e43debf708585f46bfabce77c3aa9e51aacf861 |
|
MD5 | 17b5d0cc5a7fd777fcb7e65486fdb9f4 |
|
BLAKE2b-256 | ed42bbed82aee6def96e0551baf5220e05c154a9044dba591c7b68a8a391ef93 |