Skip to main content

nlplab dictionary, stopwords module

Project description

自己蒐集的training data、字典和stopwords並且包成package,讓大家不用重複造輪子。

Usage

安裝:pip install NCHU_nlptoolkit

  1. 濾掉stopwords, remove stopwords 並且斷詞 p.s. rm stop words時就會跟著載入實驗室字典了
from NCHU_nlptoolkit.cut import *

# minword 是最小詞的字數(斷詞最少幾個字)

# default
cut_sentence(input string, flag=False, minword=1)

# return segmentation with part of speech.
cut_sentence(input string, flag=True, minword=1)
  1. 載入法律辭典
    from NCHU_nlptoolkit.cut import *
    
    load_law_dict()
    
  2. demo:
  • zh:

    >>> doc = '首先,對區塊鏈需要的第一個理解是,它是一種「將資料寫錄的技術」。'
    >>> cut_sentence(doc, flag=True)
    [('區塊鏈', 'n'), ('需要', 'n'), ('第一個', 'm'), ('理解', 'n'), ('一種', 'm'), ('資料', 'n'), ('寫錄', 'v'), ('技術', 'n')]
    
  • en:

    >>> doc = 'The City of New York, often called New York City (NYC) or simply New York, is the most populous city in the United States.'
    >>> list(cut_sentence_en(doc))
    ['City', 'New York', 'called', 'New York City', 'NYC', 'simply', 'New York', 'populous', 'city', 'United States']
    
    >>> list(cut_sentence_en(doc, flag=True))
    >>> [('City', 'NNP'), ('New York', 'NNP/NNP'), ('called', 'VBN'), ('New York City', 'NNP/NNP/NNP'), ('NYC', 'NN'), ('simply', 'RB'), ('New York', 'NNP/NNP'), ('populous', 'JJ'), ('city', 'NN'), ('United States', 'NNP/NNS')]
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

NCHU_nlptoolkit-2.0.5.tar.gz (12.9 MB view details)

Uploaded Source

File details

Details for the file NCHU_nlptoolkit-2.0.5.tar.gz.

File metadata

  • Download URL: NCHU_nlptoolkit-2.0.5.tar.gz
  • Upload date:
  • Size: 12.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for NCHU_nlptoolkit-2.0.5.tar.gz
Algorithm Hash digest
SHA256 86afedaacca1d798fc30a8aea34ee2994f9f61aeb007abe32142f000915e43bc
MD5 e728ec2652dd9c9707675b54ef74f373
BLAKE2b-256 ef516465d9bfe7dd1ec49b44e709209135832485acedbe2a58e54b33099ad679

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page