Split Korean text into sentences using heuristic algorithm.
Project description
Korean Sentence Splitter
Split Korean text into sentences using heuristic algorithm. This algorithm was greatly inspired by EungGyun Kim <jason.eg@kakaocorp.com> who is Kakao NLP Leader and one of the most brilliant NLP Engineer in Korea.
I've started this project inspired by this article and we've achieved best result on the test set. And of course, It's very robust to both Spoken and Written expressions.
Installation
The package is listed in the Python Package Index (PyPI), so you can install it with pip:
$ pip install kss
Usage
import kss
s = "회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요 다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다 강남역 맛집 토끼정의 외부 모습."
for sent in kss.split_sentences(s):
print(sent)
The result is shown below:
회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요
다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다
강남역 맛집 토끼정의 외부 모습.
Demo
Requirements
- C++11
- GCC or Clang with C++11 build supported.
- Python 3
Google Test binary provided was built on macOS.
Build from scratch
C++
$ mkdir bld
$ cd bld
$ cmake ..
$ make
$ ./sentsplit
NOTICE: Google Test binary provided was built on macOS only. So, You cannot build test binary on linux.
#include <iostream>
#include "sentence_splitter.h"
int main() {
std::string s = "회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요 다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다 강남역 맛집 토끼정의 외부 모습.";
for (auto sent : splitSentences(s)) {
std::cout << sent << std::endl;
}
return 0;
}
The result is shown below:
회사 동료 분들과 다녀왔는데 분위기도 좋고 음식도 맛있었어요
다만, 강남 토끼정이 강남 쉑쉑버거 골목길로 쭉 올라가야 하는데 다들 쉑쉑버거의 유혹에 넘어갈 뻔 했답니다
강남역 맛집 토끼정의 외부 모습.
Python
Python wrapper has implemented using Cython. You can execute build tasks by the command below.
$ python setup.py install --record files.txt
or
$ pip install .
Uninstall
$ xargs rm -rf < files.txt
or
$ pip uninstall kss
PyPI
$ python setup.py sdist
$ twine upload --repository-url https://test.pypi.org/legacy/ dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file kss-1.2.4.tar.gz
.
File metadata
- Download URL: kss-1.2.4.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.19.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e592f9d4b1aaa2ef6636f21270f11d43f79d0ea83df342d3b5dd8cd96084940e |
|
MD5 | d6211c89934940a8658c228af76c570d |
|
BLAKE2b-256 | bd917e7a7896eb67d9aa2ddfd7d58386df8bdd88c87580000f63c6bdabf17df4 |