Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

Split Korean text into sentences using heuristic algorithm.

Project description

Korean Sentence Splitter

Split Korean text into sentences using heuristic algorithm. This algorithm was greatly inspired by EungGyun Kim <jason.eg@kakaocorp.com> who is Kakao NLP Leader and one of the most brilliant NLP Engineers in Korea.

I've started this project inspired by this article and we've achieved best result on the test set. And of course, It's very robust to both Spoken and Written expressions.

Installation

The package is listed in the Python Package Index (PyPI), so you can install it with pip:

$ pip install kss

Usage

import kss

s = "회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요 다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다 강남역 맛집 토끼정의 외부 모습."
for sent in kss.split_sentences(s):
    print(sent)

The result is shown below:

회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요
다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다
강남역 맛집 토끼정의 외부 모습.

Demo

Requirements

  • C++11
    • GCC or Clang with C++11 build supported.
  • Python 3

Google Test binary provided was built on macOS.

Build from scratch

C++

$ mkdir bld
$ cd bld
$ cmake ..
$ make
$ ./sentsplit

NOTICE: Google Test binary provided was built on macOS only. So, You cannot build test binary on linux.

#include <iostream>
#include "sentence_splitter.h"

int main() {
    std::string s = "회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요 다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다 강남역 맛집 토끼정의 외부 모습.";
    for (auto sent : splitSentences(s)) {
        std::cout << sent << std::endl;
    }

    return 0;
}

The result is shown below:

회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요
다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다
강남역 맛집 토끼정의 외부 모습.

Python

Python wrapper has implemented using Cython. You can execute build tasks by the command below.

$ python setup.py install --record files.txt
or
$ pip install .

Uninstall

$ xargs rm -rf < files.txt
or
$ pip uninstall kss

PyPI

$ python setup.py sdist
$ twine upload --repository-url https://test.pypi.org/legacy/ dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for kss, version 1.2.5
Filename, size File type Python version Upload date Hashes
Filename, size kss-1.2.5.tar.gz (5.9 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page