No project description provided
Project description
kuzukiri
A simple text segmenter
What's this?
This is a python library for text segmentation of Japanese text.
Features
- Text segmentation by simple rules,
- rule-based, no machine learning,
- so you can assume results.
- comparably fast. It's written in rust-lang.
Install
from PyPI
pip install kuzukiri
from source code
pip install setuptools-rust
python setup.py install
Usage
import kuzukiri
segmenter = kuzukiri.Segmenter()
text = "これはテストです。文分割します。"
sentences = segmenter.split(text)
print(sentences) # => ['これはテストです。', '文分割します。']
For details, see examples
and tests
directories.
License
MIT
Dependencies
- PyO3 : to compile rust code for python.
- unicode_normalization crate : for NFKC normalization
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
kuzukiri-0.1.2.tar.gz
(5.0 kB
view hashes)
Built Distributions
Close
Hashes for kuzukiri-0.1.2-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66969e9ac1c4a9dd0724d9b7b843bfb7b48118ef06bebf7ab9d6212f0c35a9e0 |
|
MD5 | f750b0079f4d3b2b02fffaa1f3ca14f3 |
|
BLAKE2b-256 | 3d007f0a920d3a70643fd7b45976fd8eec4c4f31e77136ad7a59a467d2974f3a |
Close
Hashes for kuzukiri-0.1.2-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0529cc7ec980cdfc087eb2068342e76614ab1477ea5f0374628124abfabe330e |
|
MD5 | 153670add394257df2fa4a810c1b1c95 |
|
BLAKE2b-256 | 89237c4b3d59e775c38096f4e507ed86900b132b700f1e25ae50f670b4a7d6a2 |
Close
Hashes for kuzukiri-0.1.2-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f51a8d26316c81c3149281f7a21ca126fd93e60b9d634265b21d4209871c5d9b |
|
MD5 | 00990a938a27373a8e7e73b3ea1bf27e |
|
BLAKE2b-256 | 368e4459fc7afd5084777f07798095c34795adba389a1da3aaac83514367cb8b |
Close
Hashes for kuzukiri-0.1.2-cp38-cp38-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cfc97c7b7056de6b6a1dcf773d2f3b544766cab3a49eeb3d0be78886b535cbc |
|
MD5 | 37945e2b32b4c7c4c28c7a03ef0062e5 |
|
BLAKE2b-256 | 327212e60c28d29de4d7420b1926d20ec9496a99e9a2ce4f2ee8f1b24d77dd0b |
Close
Hashes for kuzukiri-0.1.2-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d02a53e5254542bf62e9982b7bc681543fefdc956751d2622455c8ee2d87482 |
|
MD5 | 5983e1fe286daac6b38fc3eef27952dc |
|
BLAKE2b-256 | 2423829abd017dcefeb68cb022574a41c54f7c810394d60ef3f9c4a38cf58002 |
Close
Hashes for kuzukiri-0.1.2-cp37-cp37m-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01a1fd2e49f96a5b8c0d83999564d83f10e00d1ad3173693bf0a95382ce8cc94 |
|
MD5 | 098b5154a1fe04c4112b7b23eb2c8668 |
|
BLAKE2b-256 | 6d61aaa59f199c6c68a0743a7b69bc1b480828b6eb9ff16e8bd63836cdf4e46a |