Skip to main content

Fast Rust/PyO3 semantic text segmentation

Project description

CharStreamer Python

charstreamer provides Python access to the Rust CharStreamer segmentation engine through a PyO3 extension module.

This first public wheel focuses on fast semantic text segmentation:

  • paragraphs
  • sentences
  • metadata-like lines
  • headings/sections
  • list items
  • dialogue spans

Install

pip install charstreamer

Example

import charstreamer

text = """# Background
The court reviewed the invoice. The shipment was late.

- Notice was timely.
- Damages were limited.
"""

segmenter = charstreamer.Segmenter.default()
annotation = segmenter.annotate(text)

print(annotation["spans"])
print(annotation["tagged"])

The project is an early development release. APIs may change before a stable 1.0 release.

Full documentation and Rust source are available at:

https://github.com/mjbommar/charstreamer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

charstreamer-0.1.0-cp39-abi3-manylinux_2_34_x86_64.whl (330.8 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

File details

Details for the file charstreamer-0.1.0-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for charstreamer-0.1.0-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 50417ef1c6a6cc7593bca44ac735fe2548f3dc7519bb43fa4c572cbf86a682af
MD5 8bc2b9b6ef71f69079bd7ec705163f00
BLAKE2b-256 d0f4686f0250b4242cbb5d8b8ce34c90c23781fdb9923873d72449809f282114

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page