Neural implementation of CKIP WS, POS, NER tools

These details have not been verified by PyPI

Project links

Homepage

License
- Free for non-commercial use
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

CkipNeuTools

This open-source library implements neural CKIP-style Chinese NLP tools.

(WS) word segmentation
(POS) part-of-speech tagging
(NER) named entity recognition

Related demo sites

Features

+1.4%/+4.0%/+2.2% performance vs. classic CKIPWS(/POS/NER) on ASBC4.0/OntoNotes5.0
Do not auto delete/change/add characters
Support indefinitely long sentences
Support user-defined recommended-word list and must-word list

Installation

tl;dr.

pip install ckipneutools[tf,gdown]

ckipneutools is a Python library hosted on PyPI. Requirements:

python>=3.6
tensorflow / tensorflow-gpu (one of them)
gdown (optional, for downloading model files from google drive)

(Minimum installation) If you have set up tensorflow, and would like to download model files by your self.

pip install ckipneutools

(Complete installation) If you have just set up a clean virtual environment, and want everything, including GPU support.

pip install ckipneutools[tfgpu,gdown]

Usage

See the complete demo script: demo.py
Or the web demo

1. Download model files

The model files are available on several mirror sites.

You can download and extract to the desired path by one of the included API.

# Downloads to ./data.zip (2GB) and extracts to ./data/
# ckipneutools.data_utils.downlaod_data_iis("./") # iis-ckip
ckipneutools.data_utils.downlaod_data_gdrive("./") # gdrive-ckip

./data/model_ner/pos_list.txt -> POS tag list, see Technical Report no. 93-05
./data/model_ner/label_list.txt -> Entity type list, see OntoNotes Release 5.0 p.21,22

2. Load model

ws = ckipneutools.WS("./data")
pos = ckipneutools.POS("./data")
ner = ckipneutools.NER("./data")

3. (Optional) Create dictionary

You can supply words for WS speicial consideration, including their relative weights.

word_to_weight = {
    "土地公": 1,
    "土地婆": 1,
    "公有": 2,
    "": 1,
    "來亂的": "啦",
    "緯來體育台": 1,
}
dictionary = ckipneutools.construct_dictionary(word_to_weight)
print(dictionary)

[(2, {'公有': 2.0}), (3, {'土地公': 1.0, '土地婆': 1.0}), (5, {'緯來體育台': 1.0})]

4. Run the WS-POS-NER pipeline

sentence_list = [
    "傅達仁今將執行安樂死，卻突然爆出自己20年前遭緯來體育台封殺，他不懂自己哪裡得罪到電視台。",
    "美國參議院針對今天總統布什所提名的勞工部長趙小蘭展開認可聽證會，預料她將會很順利通過參議院支持，成為該國有史以來第一位的華裔女性內閣成員。",
    "",
    "土地公有政策?？還是土地婆有政策。.",
    "… 你確定嗎… 不要再騙了……",
    "最多容納59,000個人,或5.9萬人,再多就不行了.這是環評的結論.",
    "科長說:1,坪數對人數為1:3。2,可以再增加。",
]

word_sentence_list = ws(
    sentence_list,
    # sentence_segmentation=True, # To consider delimiters
    # segment_delimiter_set = {",", "。", ":", "?", "!", ";"}), # This is the defualt set of delimiters
    # recommend_dictionary = dictionary1, # words in this dictionary are encouraged
    # coerce_dictionary = dictionary2, # words in this dictionary are forced
)

pos_sentence_list = pos(word_sentence_list)

entity_sentence_list = ner(word_sentence_list, pos_sentence_list)

5. (Optional) Release memory

del ws
del pos
del ner

6. Show Results

def print_word_pos_sentence(word_sentence, pos_sentence):
    assert len(word_sentence) == len(pos_sentence)
    for word, pos in zip(word_sentence, pos_sentence):
        print(f"{word}({pos})", end="\u3000")
    print()
    return

for i, sentence in enumerate(sentence_list):
    print()
    print(f"'{sentence}'")
    print_word_pos_sentence(word_sentence_list[i],  pos_sentence_list[i])
    for entity in sorted(entity_sentence_list[i]):
        print(entity)


'傅達仁今將執行安樂死，卻突然爆出自己20年前遭緯來體育台封殺，他不懂自己哪裡得罪到電視台。'
傅達仁(Nb)　今(Nd)　將(D)　執行(VC)　安樂死(Na)　，(COMMACATEGORY)　卻(D)　突然(D)　爆出(VJ)　自己(Nh)　20(Neu)　年(Nf)　前(Ng)　遭(P)　緯來(Nb)　體育台(Na)　封殺(VC)　，(COMMACATEGORY)　他(Nh)　不(D)　懂(VK)　自己(Nh)　哪裡(Ncd)　得罪到(VJ)　電視台(Nc)　。(PERIODCATEGORY)　
(0, 3, 'PERSON', '傅達仁')
(18, 22, 'DATE', '20年前')
(23, 28, 'ORG', '緯來體育台')

'美國參議院針對今天總統布什所提名的勞工部長趙小蘭展開認可聽證會，預料她將會很順利通過參議院支持，成為該國有史以來第一位的華裔女性內閣成員。'
美國(Nc)　參議院(Nc)　針對(P)　今天(Nd)　總統(Na)　布什(Nb)　所(D)　提名(VC)　的(DE)　勞工部長(Na)　趙小蘭(Nb)　展開(VC)　認可(VC)　聽證會(Na)　，(COMMACATEGORY)　預料(VE)　她(Nh)　將(D)　會(D)　很(Dfa)　順利(VH)　通過(VC)　參議院(Nc)　支持(VC)　，(COMMACATEGORY)　成為(VG)　該(Nes)　國(Nc)　有史以來(D)　第一(Neu)　位(Nf)　的(DE)　華裔(Na)　女性(Na)　內閣(Na)　成員(Na)　。(PERIODCATEGORY)　
(0, 2, 'GPE', '美國')
(2, 5, 'ORG', '參議院')
(7, 9, 'DATE', '今天')
(11, 13, 'PERSON', '布什')
(17, 21, 'ORG', '勞工部長')
(21, 24, 'PERSON', '趙小蘭')
(42, 45, 'ORG', '參議院')
(56, 58, 'ORDINAL', '第一')
(60, 62, 'NORP', '華裔')

''


'土地公有政策?？還是土地婆有政策。.'
土地公(Nb)　有(V_2)　政策(Na)　?(QUESTIONCATEGORY)　？(QUESTIONCATEGORY)　還是(Caa)　土地(Na)　婆(Na)　有(V_2)　政策(Na)　。(PERIODCATEGORY)　.(PERIODCATEGORY)　
(0, 3, 'PERSON', '土地公')

'… 你確定嗎… 不要再騙了……'
…(ETCCATEGORY)　 (WHITESPACE)　你(Nh)　確定(VK)　嗎(T)　…(ETCCATEGORY)　 (WHITESPACE)　不要(D)　再(D)　騙(VC)　了(Di)　…(ETCCATEGORY)　…(ETCCATEGORY)　

'最多容納59,000個人,或5.9萬人,再多就不行了.這是環評的結論.'
最多(VH)　容納(VJ)　59,000(Neu)　個(Nf)　人(Na)　,(COMMACATEGORY)　或(Caa)　5.9萬(Neu)　人(Na)　,(COMMACATEGORY)　再(D)　多(D)　就(D)　不行(VH)　了(T)　.(PERIODCATEGORY)　這(Nep)　是(SHI)　環評(Na)　的(DE)　結論(Na)　.(PERIODCATEGORY)　
(4, 10, 'CARDINAL', '59,000')
(14, 18, 'CARDINAL', '5.9萬')

'科長說:1,坪數對人數為1:3。2,可以再增加。'
科長(Na)　說(VE)　:1,(Neu)　坪數(Na)　對(P)　人數(Na)　為(VG)　1:3(Neu)　。(PERIODCATEGORY)　2(Neu)　,(COMMACATEGORY)　可以(D)　再(D)　增加(VHC)　。(PERIODCATEGORY)　
(4, 6, 'CARDINAL', '1,')
(12, 13, 'CARDINAL', '1')
(14, 15, 'CARDINAL', '3')
(16, 17, 'CARDINAL', '2')

LICENSE

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- Free for non-commercial use
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.0.11

Aug 26, 2019

0.0.10

Aug 26, 2019

This version

0.0.9

Aug 26, 2019

0.0.8

Aug 23, 2019

0.0.7

Aug 23, 2019

0.0.5

Aug 23, 2019

0.0.4

Aug 23, 2019

0.0.3

Aug 23, 2019

0.0.2

Aug 23, 2019

0.0.1

Aug 23, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckipneutools-0.0.9.tar.gz (17.3 kB view details)

Uploaded Aug 26, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ckipneutools-0.0.9-py3-none-any.whl (22.0 kB view details)

Uploaded Aug 26, 2019 Python 3

File details

Details for the file ckipneutools-0.0.9.tar.gz.

File metadata

Download URL: ckipneutools-0.0.9.tar.gz
Upload date: Aug 26, 2019
Size: 17.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.6.8

File hashes

Hashes for ckipneutools-0.0.9.tar.gz
Algorithm	Hash digest
SHA256	`cf52a2070e9c5fe063d137d62b78921ebec5e48e9470e83dbfed4b1b0808dd48`
MD5	`7ff39952d39a469159f0fde4a0870eb9`
BLAKE2b-256	`6a02a401979de171b9ccd921e21c076512b07f0f9146e01af971f05d9522ce26`

See more details on using hashes here.

File details

Details for the file ckipneutools-0.0.9-py3-none-any.whl.

File metadata

Download URL: ckipneutools-0.0.9-py3-none-any.whl
Upload date: Aug 26, 2019
Size: 22.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.6.8

File hashes

Hashes for ckipneutools-0.0.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`79beeac80eb3971b1f920dfaada0caf0f7da2f59a752e26324f00190505c04f8`
MD5	`b6e3d51fe01871ab1736f6240f9f1039`
BLAKE2b-256	`365da944a94ac351cf450a3320d64afc58070809a06d2028eb20031c5b104814`

See more details on using hashes here.

ckipneutools 0.0.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CkipNeuTools

Installation

Usage

1. Download model files

2. Load model

3. (Optional) Create dictionary

4. Run the WS-POS-NER pipeline

5. (Optional) Release memory

6. Show Results

LICENSE

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes