CKIP Transformers
Project description
CKIP Transformers
This open-source library implements CKIP Chinese NLP tools using transformers models.
(WS) Word Segmentation
(POS) Part-of-Speech Tagging
(NER) Named Entity Recognition
Git
PyPI
Documentation
Relative Demos / Packages
CkipTagger: An alternative Chinese NLP library with using BiLSTM.
CKIP CoreNLP Toolkit: A Chinese NLP library with more NLP tasks and utilities.
Contributers
Wei-Yun Ma at CKIP (Maintainer)
Installation
pip install -U ckip-transformers
Requirements:
Installation via Pip
pip install -U ckip-transformers
Usage
See https://ckip-transformers.readthedocs.io/en/latest/_api/ckip_transformers.html for API details.
The complete script of this example is https://github.com/ckiplab/ckip-transformers/blob/master/example/example.py.
1. Import module
from ckip_transformers.nlp import CkipWordSegmenter, CkipPosTagger, CkipNerChunker
2. Load models
# Initialize drivers
ws_driver = CkipWordSegmenter()
pos_driver = CkipPosTagger()
ner_driver = CkipNerChunker()
3. Run pipeline
The input for word segmentation and named-entity recognition must be a list of sentences.
The input for part-of-speech tagging must be a list of list of words (the output of word segmentation).
# Input text
text = [
'傅達仁今將執行安樂死,卻突然爆出自己20年前遭緯來體育台封殺,他不懂自己哪裡得罪到電視台。',
'美國參議院針對今天總統布什所提名的勞工部長趙小蘭展開認可聽證會,預料她將會很順利通過參議院支持,成為該國有史以來第一位的華裔女性內閣成員。',
]
# Run pipeline
ws = ws_driver(text)
pos = pos_driver(ws)
ner = ner_driver(text)
4. Show results
# Pack word segmentation and part-of-speech results
def pack_ws_pos_sentece(sentence_ws, sentence_pos):
assert len(sentence_ws) == len(sentence_pos)
res = []
for word_ws, word_pos in zip(sentence_ws, sentence_pos):
res.append(f'{word_ws}({word_pos})')
return '\u3000'.join(res)
# Show results
for sentence, sentence_ws, sentence_pos, sentence_ner in zip(text, ws, pos, ner):
print(sentence)
print(pack_ws_pos_sentece(sentence_ws, sentence_pos))
for entity in sentence_ner:
print(entity)
print()
傅達仁今將執行安樂死,卻突然爆出自己20年前遭緯來體育台封殺,他不懂自己哪裡得罪到電視台。
傅達仁(Nb) 今(Nd) 將(D) 執行(VC) 安樂死(Na) ,(COMMACATEGORY) 卻(D) 突然(D) 爆出(VJ) 自己(Nh) 20(Neu) 年(Nf) 前(Ng) 遭(P) 緯來(Nb) 體育台(Na) 封殺(VC) ,(COMMACATEGORY) 他(Nh) 不(D) 懂(VK) 自己(Nh) 哪裡(Ncd) 得罪到(VC) 電視台(Nc) 。(PERIODCATEGORY)
NerToken(word='傅達仁', ner='PERSON', idx=(0, 3))
NerToken(word='今', ner='DATE', idx=(3, 4))
NerToken(word='20年', ner='DATE', idx=(18, 21))
NerToken(word='緯來體育台', ner='ORG', idx=(23, 28))
美國參議院針對今天總統布什所提名的勞工部長趙小蘭展開認可聽證會,預料她將會很順利通過參議院支持,成為該國有史以來第一位的華裔女性內閣成員。
美國(Nc) 參議院(Nc) 針對(P) 今天(Nd) 總統(Na) 布什(Nb) 所(D) 提名(VC) 的(DE) 勞工部長(Na) 趙小蘭(Nb) 展開(VC) 認可(VC) 聽證會(Na) ,(COMMACATEGORY) 預料(VE) 她(Nh) 將(D) 會(D) 很(Dfa) 順利(VH) 通過(VC) 參議院(Nc) 支持(VC) ,(COMMACATEGORY) 成為(VG) 該(Nes) 國(Nc) 有史以來(D) 第一(Neu) 位(Nf) 的(DE) 華裔(Na) 女性(Na) 內閣(Na) 成員(Na) 。(PERIODCATEGORY)
NerToken(word='美國參議院', ner='ORG', idx=(0, 5))
NerToken(word='今天', ner='LOC', idx=(7, 9))
NerToken(word='布什', ner='PERSON', idx=(11, 13))
NerToken(word='勞工部長', ner='ORG', idx=(17, 21))
NerToken(word='趙小蘭', ner='PERSON', idx=(21, 24))
NerToken(word='認可聽證會', ner='EVENT', idx=(26, 31))
NerToken(word='參議院', ner='ORG', idx=(42, 45))
NerToken(word='第一', ner='ORDINAL', idx=(56, 58))
NerToken(word='華裔', ner='NORP', idx=(60, 62))
Pretrained Models
One may also use our pretrained models with HuggingFace transformers library directly: https://huggingface.co/ckiplab/.
Pretrained Language Models
NLP Task Models
License
Copyright (c) 2020 CKIP Lab under the GPL-3.0 License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ckip_transformers-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ceba968410cc355839c8f4ff5195858a9ddceb59a22d454b473ea9b666e2b2d |
|
MD5 | 6d4f1cedd3337fa118f4763f5dd592ce |
|
BLAKE2b-256 | 93d3b9d50a8a8266a83f1ea3ed234d633b610414d2ba59bb4381fde74126b881 |