Skip to main content

Split Korean text into sentences using heuristic algorithm.

Project description

Korean Sentence Splitter

Split Korean text into sentences using heuristic algorithm. This algorithm was greatly inspired by EungGyun Kim <jason.eg@kakaocorp.com> who is Kakao NLP Leader and one of the most brilliant NLP Engineers in Korea.

I've started this project inspired by this article and we've achieved best result on the test set. And of course, It's very robust to both Spoken and Written expressions.

Installation

The package is listed in the Python Package Index (PyPI), so you can install it with pip:

$ pip install kss

Usage

import kss

s = "회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요 다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다 강남역 맛집 토끼정의 외부 모습."
for sent in kss.split_sentences(s):
    print(sent)

The result is shown below:

회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요
다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다
강남역 맛집 토끼정의 외부 모습.

Demo

Requirements

  • C++11
    • GCC or Clang with C++11 build supported.
  • Python 3

Google Test binary provided was built on macOS.

Build from scratch

C++

$ mkdir bld
$ cd bld
$ cmake ..
$ make
$ ./sentsplit

NOTICE: Google Test binary provided was built on macOS only. So, You cannot build test binary on linux.

#include <iostream>
#include "sentence_splitter.h"

int main() {
    std::string s = "회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요 다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다 강남역 맛집 토끼정의 외부 모습.";
    for (auto sent : splitSentences(s)) {
        std::cout << sent << std::endl;
    }

    return 0;
}

The result is shown below:

회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요
다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다
강남역 맛집 토끼정의 외부 모습.

Python

Python wrapper has implemented using Cython. You can execute build tasks by the command below.

$ python setup.py install --record files.txt
or
$ pip install .

Uninstall

$ xargs rm -rf < files.txt
or
$ pip uninstall kss

PyPI

$ python setup.py sdist
$ twine upload --repository-url https://test.pypi.org/legacy/ dist/*

Project details


Release history Release notifications | RSS feed

This version

1.3.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kss-1.3.0.tar.gz (6.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page