Skip to main content

Sentence segmentation for japanese text

Project description

Hasami

Hasami is a tool to perform sentence segmentation on japanese text.

  • In addition to simply splitting on sentence-ending markers like !?。 it will treat runs of sentence-ending characters as a single sentence ending.
  • It will not split enclosed sentences, i.e. those in quotes or parentheses.
  • It can be configured with custom sentence-ending markers and enclosures in case the defaults don't cover your needs.
  • You can define exceptions for when not to split sentences.

Installation

:warning: Currently in progress of sorting out a PyPI name conflict so this might change in the future.

pip install py-hasami

Usage

A simple command line interface is provided to use the functionality without having to write your own script. Input is read from stdin or from a file.

$ echo "これが最初の文。これは二番目の文。これが最後の文。" | tee input.txt | hasami
これが最初の文。
これは二番目の文。
これが最後の文。

$ hasami input.txt
これが最初の文。
これは二番目の文。
これが最後の文。

To use in your code:

import hasami

hasami.segment_sentences('これが最初の文。これは二番目の文。これが最後の文。')
# => ['これが最初の文。', 'これは二番目の文。', 'これが最後の文。']

More complex examples will follow soon, please refer to the test cases in the meantime.

License

Licensed under the BSD-3-Clause License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-hasami-0.0.1.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

py_hasami-0.0.1-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file py-hasami-0.0.1.tar.gz.

File metadata

  • Download URL: py-hasami-0.0.1.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.9

File hashes

Hashes for py-hasami-0.0.1.tar.gz
Algorithm Hash digest
SHA256 68a444fbefdee32a1bb4988190200f535c5bc64639eb67c043644c30fa90bc4a
MD5 ba2cc5b3e204ef2e6b9e4886739d6cb9
BLAKE2b-256 55aa9e9f7e626a7ccd683bf679957a60599d09117ee2e439a5a5a57521ac571b

See more details on using hashes here.

File details

Details for the file py_hasami-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: py_hasami-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.9

File hashes

Hashes for py_hasami-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 70a904ac9851ee1dfe72353a7a759a99ece1bb7c7af3cca7d3330f132cf8b12a
MD5 a4ed2a8b47549c4b644933debfdce7e3
BLAKE2b-256 4473e2fd72ab2198243cf9039c7e8867fddb8d51fc3489703a58222aead51296

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page