Skip to main content

CKIP Transformers

Project description

CKIP Transformers

This open-source library implements CKIP Chinese NLP tools using transformers models.

  • (WS) Word Segmentation

  • (POS) Part-of-Speech Tagging

  • (NER) Named Entity Recognition

Git

https://github.com/emfomy/ckip-transformers

GitHub Version GitHub Release GitHub Issues

PyPI

https://pypi.org/project/ckip-transformers

PyPI Version PyPI License PyPI Downloads PyPI Python PyPI Implementation PyPI Status

Documentation

https://ckip-transformers.readthedocs.io/

ReadTheDocs Home

Relative Demos / Packages

Contributers

Installation

pip install -U ckip-transformers

Requirements:

Installation via Pip

pip install -U ckip-transformers

Usage

See https://ckip-transformers.readthedocs.io/en/latest/_api/ckip_transformers.html for API details.

The complete script of this example is https://github.com/ckiplab/ckip-transformers/blob/master/example/example.py.

1. Import module

from ckip_transformers.nlp import CkipWordSegmenter, CkipPosTagger, CkipNerChunker

2. Load models

# Initialize drivers
ws_driver  = CkipWordSegmenter()
pos_driver = CkipPosTagger()
ner_driver = CkipNerChunker()

3. Run pipeline

  • The input for word segmentation and named-entity recognition must be a list of sentences.

  • The input for part-of-speech tagging must be a list of list of words (the output of word segmentation).

# Input text
text = [
   '傅達仁今將執行安樂死,卻突然爆出自己20年前遭緯來體育台封殺,他不懂自己哪裡得罪到電視台。',
   '美國參議院針對今天總統布什所提名的勞工部長趙小蘭展開認可聽證會,預料她將會很順利通過參議院支持,成為該國有史以來第一位的華裔女性內閣成員。',
]

# Run pipeline
ws  = ws_driver(text)
pos = pos_driver(ws)
ner = ner_driver(text)

4. Show results

# Pack word segmentation and part-of-speech results
def pack_ws_pos_sentece(sentence_ws, sentence_pos):
   assert len(sentence_ws) == len(sentence_pos)
   res = []
   for word_ws, word_pos in zip(sentence_ws, sentence_pos):
      res.append(f'{word_ws}({word_pos})')
   return '\u3000'.join(res)

# Show results
for sentence, sentence_ws, sentence_pos, sentence_ner in zip(text, ws, pos, ner):
   print(sentence)
   print(pack_ws_pos_sentece(sentence_ws, sentence_pos))
   for entity in sentence_ner:
      print(entity)
   print()
傅達仁今將執行安樂死,卻突然爆出自己20年前遭緯來體育台封殺,他不懂自己哪裡得罪到電視台。
傅達仁(Nb) 今(Nd) 將(D) 執行(VC) 安樂死(Na) ,(COMMACATEGORY) 卻(D) 突然(D) 爆出(VJ) 自己(Nh) 20(Neu) 年(Nf) 前(Ng) 遭(P) 緯來(Nb) 體育台(Na) 封殺(VC) ,(COMMACATEGORY) 他(Nh) 不(D) 懂(VK) 自己(Nh) 哪裡(Ncd) 得罪到(VC) 電視台(Nc) 。(PERIODCATEGORY)
NerToken(word='傅達仁', ner='PERSON', idx=(0, 3))
NerToken(word='今', ner='DATE', idx=(3, 4))
NerToken(word='20年', ner='DATE', idx=(18, 21))
NerToken(word='緯來體育台', ner='ORG', idx=(23, 28))

美國參議院針對今天總統布什所提名的勞工部長趙小蘭展開認可聽證會,預料她將會很順利通過參議院支持,成為該國有史以來第一位的華裔女性內閣成員。
美國(Nc) 參議院(Nc) 針對(P) 今天(Nd) 總統(Na) 布什(Nb) 所(D) 提名(VC) 的(DE) 勞工部長(Na) 趙小蘭(Nb) 展開(VC) 認可(VC) 聽證會(Na) ,(COMMACATEGORY) 預料(VE) 她(Nh) 將(D) 會(D) 很(Dfa) 順利(VH) 通過(VC) 參議院(Nc) 支持(VC) ,(COMMACATEGORY) 成為(VG) 該(Nes) 國(Nc) 有史以來(D) 第一(Neu) 位(Nf) 的(DE) 華裔(Na) 女性(Na) 內閣(Na) 成員(Na) 。(PERIODCATEGORY)
NerToken(word='美國參議院', ner='ORG', idx=(0, 5))
NerToken(word='今天', ner='LOC', idx=(7, 9))
NerToken(word='布什', ner='PERSON', idx=(11, 13))
NerToken(word='勞工部長', ner='ORG', idx=(17, 21))
NerToken(word='趙小蘭', ner='PERSON', idx=(21, 24))
NerToken(word='認可聽證會', ner='EVENT', idx=(26, 31))
NerToken(word='參議院', ner='ORG', idx=(42, 45))
NerToken(word='第一', ner='ORDINAL', idx=(56, 58))
NerToken(word='華裔', ner='NORP', idx=(60, 62))

Pretrained Models

One may also use our pretrained models with HuggingFace transformers library directly: https://huggingface.co/ckiplab/.

Pretrained Language Models

NLP Task Models

License

GPL-3.0

Copyright (c) 2020 CKIP Lab under the GPL-3.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckip-transformers-0.1.0.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

ckip_transformers-0.1.0-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file ckip-transformers-0.1.0.tar.gz.

File metadata

  • Download URL: ckip-transformers-0.1.0.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.9

File hashes

Hashes for ckip-transformers-0.1.0.tar.gz
Algorithm Hash digest
SHA256 48f5515a62fa32af682a9d237728493693d58a87201c75c1df220bf740d9b7b7
MD5 218664303cb42c93bb2be85130348ce1
BLAKE2b-256 85adf370b586982fd2163edf147bd96051c5ecfbac56c4e0aee5aea73f660a66

See more details on using hashes here.

File details

Details for the file ckip_transformers-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ckip_transformers-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.9

File hashes

Hashes for ckip_transformers-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1ceba968410cc355839c8f4ff5195858a9ddceb59a22d454b473ea9b666e2b2d
MD5 6d4f1cedd3337fa118f4763f5dd592ce
BLAKE2b-256 93d3b9d50a8a8266a83f1ea3ed234d633b610414d2ba59bb4381fde74126b881

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page