Skip to main content

Fast Rust/PyO3 semantic text segmentation

Project description

CharStreamer Python

charstreamer provides Python access to the Rust CharStreamer segmentation engine through a PyO3 extension module.

This package exposes the Rust model artifact loader and model-backed segmentation runtime. If no supported model is available, annotation fails instead of synthesizing semantic labels from hard-coded rules.

The vendored 0.1.4 bundle emits model-backed sentence, paragraph, metadata, section, and list_item spans. dialogue remains reserved until there is a balanced dialogue training set.

Install

pip install charstreamer

Example

import charstreamer

text = """# Background
The court reviewed the invoice. The shipment was late. Notice was timely."""

segmenter = charstreamer.Segmenter.default()
print(segmenter.model_info().runtime)
annotation = segmenter.annotate(text)

print(annotation.spans)
print(annotation.tagged)

The public Python wrapper returns typed immutable dataclasses: ModelInfo, Annotation, Span, and BenchmarkResult. For JSON output or legacy integrations, call .to_dict() or use methods such as segmenter.annotate_dict(text).

Performance

On the current Linux x86_64 release-wheel benchmark, the combined sentence+semantic segmenter runs at roughly 34-35 MiB/s end-to-end on a long UTF-8 document. This includes model inference, span decoding, and tagged rendering for the default Burn model bundle.

Measure local throughput with:

result = segmenter.benchmark(text, iterations=10)
print(result.mib_per_second)
print(result.chars_per_second)

If a default model is vendored into the wheel, Segmenter.default() loads it from package data. If not, it checks the local cache and then the GitHub release model URL unless CHARSTREAMER_AUTO_DOWNLOAD=0 is set. To assert model availability during startup:

charstreamer.model_info(allow_download=False, require_model=True)
segmenter = charstreamer.Segmenter.default(require_model=True)

Model-backed release wheels must include charstreamer/models/default/manifest.json plus the referenced Burn payload.

The vendored 0.1.4 bundle combines a sentence-boundary model with a semantic structure model. It is an early model-backed release, not a final semantic span/IOB model, and quality should be evaluated against task-specific data before production use.

The project is an early development release. APIs may change before a stable 1.0 release.

Full documentation and Rust source are available at:

https://github.com/mjbommar/charstreamer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

charstreamer-0.1.7.tar.gz (1.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

charstreamer-0.1.7-cp39-abi3-win_arm64.whl (2.6 MB view details)

Uploaded CPython 3.9+Windows ARM64

charstreamer-0.1.7-cp39-abi3-win_amd64.whl (3.0 MB view details)

Uploaded CPython 3.9+Windows x86-64

charstreamer-0.1.7-cp39-abi3-manylinux_2_38_x86_64.whl (15.3 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.38+ x86-64

charstreamer-0.1.7-cp39-abi3-manylinux_2_38_aarch64.whl (9.7 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.38+ ARM64

charstreamer-0.1.7-cp39-abi3-macosx_15_0_arm64.whl (12.2 MB view details)

Uploaded CPython 3.9+macOS 15.0+ ARM64

File details

Details for the file charstreamer-0.1.7.tar.gz.

File metadata

  • Download URL: charstreamer-0.1.7.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for charstreamer-0.1.7.tar.gz
Algorithm Hash digest
SHA256 54f447361f82509738489c2d953fed2e71b08cacf0d57541c90f30a462b94a67
MD5 bfdf2635f9fc5b3d5595075c83afb998
BLAKE2b-256 03e8439fbd733f9fbcc1581d1a65c5e41afe5a366a8b181e33c3fda217335de0

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.7-cp39-abi3-win_arm64.whl.

File metadata

  • Download URL: charstreamer-0.1.7-cp39-abi3-win_arm64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: CPython 3.9+, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for charstreamer-0.1.7-cp39-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 607ec6b079df63e970a88cb2fc1d17ac47692a2ed5989463705dee59355135f6
MD5 7bf81fec7904a33b37d5579dacaec2e3
BLAKE2b-256 257ca80f2f3c5778584b45e1e87aa06d316d41625a3f93bf4d9490dc10998c29

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.7-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: charstreamer-0.1.7-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for charstreamer-0.1.7-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c345172b91bfdc8cebdb2b0d11d9e227cae8100e0a24e6a1d499c378a234d0b1
MD5 3080b678ef8eae2418ee1a0379be0d60
BLAKE2b-256 915b8714c22699f333d3319bff4db667b84bbe4dfe8667419190a1fa06a6faed

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.7-cp39-abi3-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for charstreamer-0.1.7-cp39-abi3-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 c4fd6072da6a1e1d619b754e7143fa605506131cdf6a0da188f1b3af16735797
MD5 35145b77728b1522004c653c5cdd8f9a
BLAKE2b-256 5f291818b65b6fc5e5f83ea555e5dcfa5e3affe0176037763c51f3b83237610a

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.7-cp39-abi3-manylinux_2_38_aarch64.whl.

File metadata

File hashes

Hashes for charstreamer-0.1.7-cp39-abi3-manylinux_2_38_aarch64.whl
Algorithm Hash digest
SHA256 c7669e93b174e3bb5f055eabab6a1cf22b3a3317e3c3f2c3c00f7dec934232b8
MD5 dde58e649213e1b2219ed1d02d5defb6
BLAKE2b-256 369bdaadc18db780a1596b97a75b0ed29af41817b30af43f4b280c18bcf728bb

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.7-cp39-abi3-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for charstreamer-0.1.7-cp39-abi3-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 14d3e0798cc6c72bb4696c648a431b36a1aca885e635fc6e275b4ede8a29ee9d
MD5 4818dbdd3b7192d231e2917733242683
BLAKE2b-256 e97587910288f151effbfe8d6383a5362030a44bcb441d03a82e5e9afe0294d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page