Skip to main content

Segment text with Unicode TR29-compliant segmenters.

Project description

Unicode Segment

Segment text with Unicode TR29-compliant segmenters.

Usage

from unicode_segment import SentenceSegmenter

text = """This, that, the other thing, etc. Another sentence... A, b, c, \
etc., and more. D, e, f, etc. and more. One, i. e. two. Three, i. e., four. \
Five, i.e. six. You have 4.2 messages. Property access: `a.b.c`."""

segmenter = SentenceSegmenter()
segments = segmenter.segment(text)

assert list(segments) == [
    (0, "This, that, the other thing, etc. "),
    (34, "Another sentence... "),
    (54, "A, b, c, etc., and more. "),
    (79, "D, e, f, etc. and more. "),
    (103, "One, i. e. two. "),
    (119, "Three, i. e., four. "),
    (139, "Five, i.e. six. "),
    (155, "You have 4.2 messages. "),
    (178, "Property access: `a.b.c`."),
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicode_segment-0.3.0.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unicode_segment-0.3.0-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file unicode_segment-0.3.0.tar.gz.

File metadata

  • Download URL: unicode_segment-0.3.0.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for unicode_segment-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fde0af25579e086a1d73709f88e0a9b4428f47e05d841ad682cbe20ed5972781
MD5 2cd55f17940c7dd48dd1c4da36ee6121
BLAKE2b-256 074c92d39706794ebbef143e69ae2fb7e50f42c94d0eb1a54751b17cbda093ed

See more details on using hashes here.

File details

Details for the file unicode_segment-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for unicode_segment-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 83eb6172d0f3a7090521ba77437cb9abcb6fc809d4fcadfd0610b14922770fb9
MD5 5e74f608b8526ddcc4d327586e5d2201
BLAKE2b-256 26fc627479a8f11fc388d1f91052d334b4e9e0812ad38d12288a67b8767b41c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page