Skip to main content

Named Entity Recognition Toolkit

Project description

Named Entity Recognition Toolkit

Provide a toolkit for rapidly extracting useful entities from text using various Python packages, including Stanza.

Features

We try to bring the complicated use of existing NLP toolkits down to earth by keeping APIs as simple as possible with best practice.

Installation

pip install ner-kit

Examples

Example 1: Word segmention

from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
    sw=StanzaWrapper()
    sw.download(lang="en")
    text='This is a test sentence for stanza. This is another sentence.'
    result1=sw.tokenize(text)
    sw.print_result(result1)

Example 2: Chinese word segmentation

from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
    sw=StanzaWrapper()
    sw.download(lang="zh")
    text='我在北京吃苹果!'
    result1=sw.tokenize(text,lang='zh')
    sw.print_result(result1)

Example 3: Multi-Word Token (MWT) Expansion

from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
    sw=StanzaWrapper()
    sw.download(lang="fr")
    text='Nous avons atteint la fin du sentier.'
    result1=sw.mwt_expand(text,lang='fr')
    sw.print_result(result1)

Example 4: POS tagging

from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
    sw=StanzaWrapper()
    sw.download(lang='en')
    text='I like apple'
    result1=sw.tag(text)
    sw.print_result(result1)
    sw.download_chinese_model()
    text='我喜欢苹果'
    result2=sw.tag_chinese(text,lang='zh')
    sw.print_result(result2)

Example 5: Named Entity Recognition

from nerkit.StanzaApi import StanzaWrapper

if __name__=="__main__":
    sw=StanzaWrapper()

    sw.download(lang='en')
    sw.download_chinese_model()

    text_en = 'I like Beijing!'
    result1 = sw.ner(text_en)
    sw.print_result(result1)

    text='我喜欢北京!'
    result2=sw.ner_chinese(text)
    sw.print_result(result2)

Example 6: Sentiment Analysis

from nerkit.StanzaApi import StanzaWrapper

if __name__=="__main__":
    sw=StanzaWrapper()
    text_en = 'I like Beijing!'
    result1 = sw.sentiment(text_en)
    sw.print_result(result1)

    text_zh='我讨厌苹果!'
    result2=sw.sentiment_chinese(text_zh)
    sw.print_result(result2)

Example 7: Language detection from text

from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
    sw=StanzaWrapper()
    list_text = ['I like Beijing!','我喜欢北京!', "Bonjour le monde!"]
    result1 = sw.lang(list_text)
    sw.print_result(result1)

Example 8: Language detection from text with a user-defined processing function

from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
    sw=StanzaWrapper()
    list_text = ['I like Beijing!','我喜欢北京!', "Bonjour le monde!"]
    def process(model):# do your own business
        doc=model["doc"]
        print(f"{doc.sentences[0].dependencies_string()}")
    result1 = sw.lang_multi(list_text,func_process=process,download_lang='en,zh,fr')
    print(result1)
    sw.print_result(result1)

Example 9: Stanza's NER (Legacy use for Java-based Stanford CoreNLP)

from nerkit.StanzaApi import StanfordCoreNLPClient
corenlp_root_path=f"stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2"
corenlp=StanfordCoreNLPClient(corenlp_root_path=corenlp_root_path,language='zh')
text="我喜欢游览广东孙中山故居景点!"
list_token=corenlp.get_entity_list(text)
for token in list_token:
    print(f"{token['value']}\t{token['pos']}\t{token['ner']}")

Example 10: Stanford CoreNLP (Not official version)

import os
from nerkit.StanfordCoreNLP import get_entity_list
text="我喜欢游览广东孙中山故居景点!"
current_path = os.path.dirname(os.path.realpath(__file__))
res=get_entity_list(text,resource_path=f"{current_path}/stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2")
print(res)
for w,tag in res:
    if tag in ['PERSON','ORGANIZATION','LOCATION']:
        print(w,tag)

Example 11: Open IE

from nerkit.StanzaApi import StanfordCoreNLPClient
corenlp_root_path=f"stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2"
text = "Barack Obama was born in Hawaii. Richard Manning wrote this sentence."
corenlp=StanfordCoreNLPClient(corenlp_root_path=corenlp_root_path)
list_result=corenlp.open_ie(text)
for model in list_result:
    print(model["subject"],'--',model['relation'],'-->',model["object"])
out=corenlp.force_close_server() # support force closing port in Windows
print(out)

Example 12: Generate triples from files

from nerkit.triples.text import generate_triples_from_files
input_folder='data'
output_folder='output'
list_result_all=generate_triples_from_files(input_folder=input_folder,
                                            output_folder=output_folder,
                                            return_all_results=True,
                                            ltp_data_folder='../ltp_data')

print(list_result_all)

Example 13: Generate a list of tripls

from nerkit.triples.ltp import *
text=open("data/test.txt",'r',encoding='utf-8').read()
extractor=get_ltp_triple_instance(ltp_data_folder='D:/UIBEResearch/ltp_data')
list_event=get_ltp_triple_list(extractor=extractor,text=text)
for event in list_event:
    print(event)

Credits & References

License

The ner-kit project is provided by Donghua Chen.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ner-kit-0.0.5.tar.gz (44.3 kB view details)

Uploaded Source

Built Distribution

ner_kit-0.0.5-py3-none-any.whl (47.5 kB view details)

Uploaded Python 3

File details

Details for the file ner-kit-0.0.5.tar.gz.

File metadata

  • Download URL: ner-kit-0.0.5.tar.gz
  • Upload date:
  • Size: 44.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11

File hashes

Hashes for ner-kit-0.0.5.tar.gz
Algorithm Hash digest
SHA256 0f8ca98d18c273900e626cd9526a33bc114cdeba832f60281bcc7c754de1ca71
MD5 aa1185c784ac7bac5f1003a759038d97
BLAKE2b-256 4aebdfee7470e28fadb26e6a3badc2f72effb7979ff37aa23418ef00ddcb8858

See more details on using hashes here.

File details

Details for the file ner_kit-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: ner_kit-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 47.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11

File hashes

Hashes for ner_kit-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0d72c2a726def4d8956b7abb2e43debf708585f46bfabce77c3aa9e51aacf861
MD5 17b5d0cc5a7fd777fcb7e65486fdb9f4
BLAKE2b-256 ed42bbed82aee6def96e0551baf5220e05c154a9044dba591c7b68a8a391ef93

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page