NCHU-nlptoolkit

nlplab dictionary, stopwords module

These details have not been verified by PyPI

Project description

自己蒐集的training data、字典和stopwords並且包成package，讓大家不用重複造輪子。

Usage

安裝：pip install NCHU_nlptoolkit

濾掉stopwords, remove stopwords 並且斷詞 p.s. rm stop words時就會跟著載入實驗室字典了

from NCHU_nlptoolkit.cut import *

# minword 是最小詞的字數(斷詞最少幾個字)

# default
cut_sentence(input string, flag=False, minword=1)

# return segmentation with part of speech.
cut_sentence(input string, flag=True, minword=1)

載入法律辭典

from NCHU_nlptoolkit.cut import *

load_law_dict()

demo:

zh:

>>> doc = '首先，對區塊鏈需要的第一個理解是，它是一種「將資料寫錄的技術」。'
>>> cut_sentence(doc, flag=True)
[('區塊鏈', 'n'), ('需要', 'n'), ('第一個', 'm'), ('理解', 'n'), ('一種', 'm'), ('資料', 'n'), ('寫錄', 'v'), ('技術', 'n')]

en:

>>> doc = 'The City of New York, often called New York City (NYC) or simply New York, is the most populous city in the United States.'
>>> list(cut_sentence_en(doc))
['City', 'New York', 'called', 'New York City', 'NYC', 'simply', 'New York', 'populous', 'city', 'United States']

>>> list(cut_sentence_en(doc, flag=True))
>>> [('City', 'NNP'), ('New York', 'NNP/NNP'), ('called', 'VBN'), ('New York City', 'NNP/NNP/NNP'), ('NYC', 'NN'), ('simply', 'RB'), ('New York', 'NNP/NNP'), ('populous', 'JJ'), ('city', 'NN'), ('United States', 'NNP/NNS')]

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.0.5

Sep 19, 2023

2.0.4

Sep 19, 2023

2.0.3

Sep 19, 2023

2.0.2

Mar 24, 2023

2.0.1

Mar 24, 2023

2.0.0

Mar 24, 2023

1.0.5 yanked

Mar 23, 2023

Reason this release was yanked:

improve loading efficiency

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

NCHU_nlptoolkit-2.0.5.tar.gz (12.9 MB view details)

Uploaded Sep 19, 2023 Source

File details

Details for the file NCHU_nlptoolkit-2.0.5.tar.gz.

File metadata

Download URL: NCHU_nlptoolkit-2.0.5.tar.gz
Upload date: Sep 19, 2023
Size: 12.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for NCHU_nlptoolkit-2.0.5.tar.gz
Algorithm	Hash digest
SHA256	`86afedaacca1d798fc30a8aea34ee2994f9f61aeb007abe32142f000915e43bc`
MD5	`e728ec2652dd9c9707675b54ef74f373`
BLAKE2b-256	`ef516465d9bfe7dd1ec49b44e709209135832485acedbe2a58e54b33099ad679`

See more details on using hashes here.

NCHU-nlptoolkit 2.0.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Usage

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes