No project description provided
Project description
kuzukiri
A simple text segmenter
What's this?
This is a python library for text segmentation of Japanese text.
Features
- Text segmentation by simple rules,
- rule-based, no machine learning,
- so you can assume results.
- comparably fast. It's written in rust-lang.
Install
from source code
pip install setuptools-rust
python setup.py install
Usage
import kuzukiri
segmenter = kuzukiri.Segmenter()
text = "これはテストです。文分割します。"
sentences = segmenter.split(text)
print(sentences) # => ['これはテストです。', '文分割します。']
For details, see examples
and tests
directories.
License
MIT
Dependencies
- PyO3 : to compile rust code for python.
- unicode_normalization crate : for NFKC normalization
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
kuzukiri-0.1.1.tar.gz
(4.9 kB
view hashes)
Built Distribution
Close
Hashes for kuzukiri-0.1.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb5a798c7d6959ce60ee4b392d650d189faf4b74886177579421ea0f7ebf8077 |
|
MD5 | 028e53ef11753172ce67aa91afea0afb |
|
BLAKE2b-256 | f375f9cb9f5f410a82865f12b30f3ffe7509b575ff869190fe99a9fb73fcbe9a |