Skip to main content

Split Korean text into sentences using heuristic algorithm.

Project description

Korean Sentence Splitter

Split Korean text into sentences using heuristic algorithm. This algorithm was greatly inspired by EungGyun Kim <jason.eg@kakaocorp.com> who is Kakao NLP Leader and one of the most brilliant NLP Engineer in Korea.

I've started this project inspired by this article and we've achieved best result on the test set. And of course, It's very robust to both Spoken and Written expressions.

Installation

The package is listed in the Python Package Index (PyPI), so you can install it with pip:

$ pip install kss

Usage

import kss

s = "회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요 다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다 강남역 맛집 토끼정의 외부 모습."
for sent in kss.split_sentences(s):
    print(sent)

The result is shown below:

회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요
다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다
강남역 맛집 토끼정의 외부 모습.

Demo

Requirements

  • C++11
    • GCC or Clang with C++11 build supported.
  • Python 3

Google Test binary provided was built on macOS.

Build from scratch

C++

$ mkdir bld
$ cd bld
$ cmake ..
$ make
$ ./sentsplit

NOTICE: Google Test binary provided was built on macOS only. So, You cannot build test binary on linux.

#include <iostream>
#include "sentence_splitter.h"

int main() {
    std::string s = "회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요 다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다 강남역 맛집 토끼정의 외부 모습.";
    for (auto sent : splitSentences(s)) {
        std::cout << sent << std::endl;
    }

    return 0;
}

The result is shown below:

회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요
다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다
강남역 맛집 토끼정의 외부 모습.

Python

Python wrapper has implemented using Cython. You can execute build tasks by the command below.

$ python setup.py install --record files.txt
or
$ pip install .

Uninstall

$ xargs rm -rf < files.txt
or
$ pip uninstall kss

PyPI

$ python setup.py sdist
$ twine upload --repository-url https://test.pypi.org/legacy/ dist/*

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kss-1.2.4.tar.gz (5.9 kB view details)

Uploaded Source

File details

Details for the file kss-1.2.4.tar.gz.

File metadata

  • Download URL: kss-1.2.4.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.19.1 CPython/3.7.3

File hashes

Hashes for kss-1.2.4.tar.gz
Algorithm Hash digest
SHA256 e592f9d4b1aaa2ef6636f21270f11d43f79d0ea83df342d3b5dd8cd96084940e
MD5 d6211c89934940a8658c228af76c570d
BLAKE2b-256 bd917e7a7896eb67d9aa2ddfd7d58386df8bdd88c87580000f63c6bdabf17df4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page