Skip to main content

Light-weight sentence tokenizer for Korean.

Project description

A light-weight sentence tokenizer for Korean.

Half-width punctuation is generally used in Korean, but this tokenizer also supports full-width punctuation. (For details about full-width punctuation in Korean, please see https://www.w3.org/TR/klreq/).

Sample Code:

from kr_sentence.tokenizer import tokenize

paragraph_str = "저는 미국인이에요. 만나서 반갑습니다."

sentence_list = tokenize(paragraph_str)

for sentence in sentence_list: print(sentence)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kr_sentence-0.0.3.tar.gz (2.7 kB view hashes)

Uploaded Source

Built Distribution

kr_sentence-0.0.3-py3-none-any.whl (3.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page