Set of whole-word (independent) stop words in Korean.
Project description
ko-ww-stopwords
This is a set of whole-word (independent) stop words in Korean. Dependent stop words, on the other hand, are difficult to identify without using a part-of-speech tagger, but it is easy to identify whole-word (independent) stop words.
Code Sample
from ko_ww_stopwords.stop_words import ko_ww_stop_words from ko_ww_stopwords.tools import is_stop_word, strip_outer_punct
print(ko_ww_stop_words)
#is_stop_word(word) #Returns true if word is a whole-word stop word.
print("우선 is_stop_word -> {}".format(is_stop_word("우선")))
print("서울 is_stop_word -> {}".format(is_stop_word("서울")))
#strip_outer_punct(word) #Strips leading and trailing punctuation marks from word.
raw_str = "(우선)"
print("raw_str is_stop_word -> {}".format(is_stop_word(raw_str)))
normalized_str = strip_outer_punct(raw_str)
print("normalized_str is_stop_word -> {}".format(is_stop_word(normalized_str)))
Other Packages
If you need a Korean sentence tokenizer, please see https://github.com/Rairye/kr-sentence
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ko_ww_stopwords-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88883ee851b6c3be7468241b0fcf0bc0fe327d003b73a809710be04f1193627e |
|
MD5 | fe339f0b34368636393e0c98b92650f1 |
|
BLAKE2b-256 | f3d7f966ec69731af9c16c5becde0b41ea2d064aa35f084910552f1a0f73ec6d |