Skip to main content

Segment text with Unicode TR29-compliant segmenters.

Project description

Unicode Segment

Segment text with Unicode TR29-compliant segmenters.

Usage

from unicode_segment import SentenceSegmenter

text = """This, that, the other thing, etc. Another sentence... A, b, c, \
etc., and more. D, e, f, etc. and more. One, i. e. two. Three, i. e., four. \
Five, i.e. six. You have 4.2 messages. Property access: `a.b.c`."""

segmenter = SentenceSegmenter()
segments = segmenter.segment(text)

assert list(segments) == [
    (0, "This, that, the other thing, etc. "),
    (34, "Another sentence... "),
    (54, "A, b, c, etc., and more. "),
    (79, "D, e, f, etc. and more. "),
    (103, "One, i. e. two. "),
    (119, "Three, i. e., four. "),
    (139, "Five, i.e. six. "),
    (155, "You have 4.2 messages. "),
    (178, "Property access: `a.b.c`."),
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicode_segment-0.4.0.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unicode_segment-0.4.0-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file unicode_segment-0.4.0.tar.gz.

File metadata

  • Download URL: unicode_segment-0.4.0.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for unicode_segment-0.4.0.tar.gz
Algorithm Hash digest
SHA256 e74d2d79b3fa6ad717837a3bebc9435c53dfa61859832864775311ca3bdd0e42
MD5 0aefabed805a094d981a53f6d468bfff
BLAKE2b-256 0f1d4e68269b6f87ade669f3c71025accdbb8ddb1b50ab3586dff8ceb58ff33e

See more details on using hashes here.

File details

Details for the file unicode_segment-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for unicode_segment-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1d9eb76e7b9dc2c0fe7435396dd4f46d0322ad5aa10bf1a0008e97ef853623b6
MD5 02194a28daef20928828d0eafffa18da
BLAKE2b-256 3bfa3593edd4f5e2f4fee79ccc3a11eb48930a0cbfb901bc6f6ed9e215368f01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page