Skip to main content

No project description provided

Project description

日本語

kuzukiri

A simple text segmenter

What's this?

This is a python library for text segmentation of Japanese text.

Features

  • Text segmentation by simple rules,
    • rule-based, no machine learning,
    • so you can assume results.
  • comparably fast. It's written in rust-lang.

Install

from PyPI

pip install kuzukiri

from source code

pip install setuptools-rust
python setup.py install

Usage

import kuzukiri

segmenter = kuzukiri.Segmenter()
text = "これはテストです。文分割します。"
sentences = segmenter.split(text)
print(sentences)  # => ['これはテストです。', '文分割します。']

For details, see examples and tests directories.

License

MIT

Dependencies

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kuzukiri-0.1.3.tar.gz (5.0 kB view hashes)

Uploaded Source

Built Distributions

kuzukiri-0.1.3-cp310-cp310-manylinux1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.10

kuzukiri-0.1.3-cp310-cp310-macosx_12_0_x86_64.whl (290.8 kB view hashes)

Uploaded CPython 3.10 macOS 12.0+ x86-64

kuzukiri-0.1.3-cp39-cp39-manylinux1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.9

kuzukiri-0.1.3-cp39-cp39-macosx_12_0_x86_64.whl (290.9 kB view hashes)

Uploaded CPython 3.9 macOS 12.0+ x86-64

kuzukiri-0.1.3-cp38-cp38-manylinux1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.8

kuzukiri-0.1.3-cp38-cp38-macosx_12_0_x86_64.whl (291.0 kB view hashes)

Uploaded CPython 3.8 macOS 12.0+ x86-64

kuzukiri-0.1.3-cp37-cp37m-manylinux1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.7m

kuzukiri-0.1.3-cp37-cp37m-macosx_12_0_x86_64.whl (291.2 kB view hashes)

Uploaded CPython 3.7m macOS 12.0+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page