Skip to main content

Sentence segmentation for japanese text

Project description

Hasami

Hasami is a tool to perform sentence segmentation on japanese text.

  • In addition to simply splitting on sentence-ending markers like !?。 it will treat runs of sentence-ending characters as a single sentence ending.
  • It will not split enclosed sentences, i.e. those in quotes or parentheses.
  • It can be configured with custom sentence-ending markers and enclosures in case the defaults don't cover your needs.
  • You can define exceptions for when not to split sentences.

Installation

pip install hasami

Usage

A simple command line interface is provided to use the functionality without having to write your own script. Input is read from stdin or from a file.

$ echo "これが最初の文。これは二番目の文。これが最後の文。" | tee input.txt | hasami
これが最初の文。
これは二番目の文。
これが最後の文。

$ hasami input.txt
これが最初の文。
これは二番目の文。
これが最後の文。

To use in your code:

import hasami

hasami.segment_sentences('これが最初の文。これは二番目の文。これが最後の文。')
# => ['これが最初の文。', 'これは二番目の文。', 'これが最後の文。']

More complex examples will follow soon, please refer to the test cases in the meantime.

License

Licensed under the BSD-3-Clause License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hasami-0.0.1.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

hasami-0.0.1-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file hasami-0.0.1.tar.gz.

File metadata

  • Download URL: hasami-0.0.1.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.9

File hashes

Hashes for hasami-0.0.1.tar.gz
Algorithm Hash digest
SHA256 f9aaffc985fd11bd064b68317f915af3bbd2b5ec3847789cb87f4ca7dddf9d3b
MD5 cae028806a0ad08c1287b79e5f5803ae
BLAKE2b-256 33ec5ceadc89f041e1b7bb9a2268413b5b1b9ecf513361cb7028bc4fbb0525d4

See more details on using hashes here.

File details

Details for the file hasami-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: hasami-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.9

File hashes

Hashes for hasami-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fd5a2d89b7fe00565b036875619eca719972b8f188a6b5b888cad6d324de8ce2
MD5 c5a1625047a5381b3cbd41fea0f7bf72
BLAKE2b-256 b7fae19012d6a2584de17ef74b00cc49295d76dc104152118a7bc244ee18f2c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page