Skip to main content

Fast Rust/PyO3 semantic text segmentation

Project description

CharStreamer Python

charstreamer provides Python access to the Rust CharStreamer segmentation engine through a PyO3 extension module.

This package exposes the Rust model artifact loader and model-backed segmentation runtime. If no supported model is available, annotation fails instead of synthesizing semantic labels from hard-coded rules.

The vendored 0.1.4 bundle emits model-backed sentence, paragraph, metadata, section, and list_item spans. dialogue remains reserved until there is a balanced dialogue training set.

Install

pip install charstreamer

Example

import charstreamer

text = """# Background
The court reviewed the invoice. The shipment was late. Notice was timely."""

segmenter = charstreamer.Segmenter.default()
print(segmenter.model_info().runtime)
annotation = segmenter.annotate(text)

print(annotation.spans)
print(annotation.tagged)

The public Python wrapper returns typed immutable dataclasses: ModelInfo, Annotation, Span, and BenchmarkResult. For JSON output or legacy integrations, call .to_dict() or use methods such as segmenter.annotate_dict(text).

Performance

On the current Linux x86_64 release-wheel benchmark, the combined sentence+semantic segmenter runs at roughly 34-35 MiB/s end-to-end on a long UTF-8 document. This includes model inference, span decoding, and tagged rendering for the default Burn model bundle.

Measure local throughput with:

result = segmenter.benchmark(text, iterations=10)
print(result.mib_per_second)
print(result.chars_per_second)

If a default model is vendored into the wheel, Segmenter.default() loads it from package data. If not, it checks the local cache and then the GitHub release model URL unless CHARSTREAMER_AUTO_DOWNLOAD=0 is set. To assert model availability during startup:

charstreamer.model_info(allow_download=False, require_model=True)
segmenter = charstreamer.Segmenter.default(require_model=True)

Model-backed release wheels must include charstreamer/models/default/manifest.json plus the referenced Burn payload.

The vendored 0.1.4 bundle combines a sentence-boundary model with a semantic structure model. It is an early model-backed release, not a final semantic span/IOB model, and quality should be evaluated against task-specific data before production use.

The project is an early development release. APIs may change before a stable 1.0 release.

Full documentation and Rust source are available at:

https://github.com/mjbommar/charstreamer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

charstreamer-0.1.5.tar.gz (1.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

charstreamer-0.1.5-cp39-abi3-win_arm64.whl (2.5 MB view details)

Uploaded CPython 3.9+Windows ARM64

charstreamer-0.1.5-cp39-abi3-win_amd64.whl (3.0 MB view details)

Uploaded CPython 3.9+Windows x86-64

charstreamer-0.1.5-cp39-abi3-manylinux_2_38_x86_64.whl (15.3 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.38+ x86-64

charstreamer-0.1.5-cp39-abi3-manylinux_2_38_aarch64.whl (9.7 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.38+ ARM64

charstreamer-0.1.5-cp39-abi3-macosx_15_0_arm64.whl (12.2 MB view details)

Uploaded CPython 3.9+macOS 15.0+ ARM64

File details

Details for the file charstreamer-0.1.5.tar.gz.

File metadata

  • Download URL: charstreamer-0.1.5.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for charstreamer-0.1.5.tar.gz
Algorithm Hash digest
SHA256 87fa35682d4edc6acc4ecf3f543e26a5ae98b5e9819db54750c48877c586607b
MD5 acad8235bfa475ef0e91ef55b2d5b21d
BLAKE2b-256 689cd59c99032b9e041b237cebc61ebcf576607a158b7e9df063ed71fdcd9e81

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.5-cp39-abi3-win_arm64.whl.

File metadata

  • Download URL: charstreamer-0.1.5-cp39-abi3-win_arm64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: CPython 3.9+, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for charstreamer-0.1.5-cp39-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 b3f4e46ef8ad82f7054fbe023847d328e7e68cd86002bdb0d0ec9507dfeb9b61
MD5 bcfa5bde32b94315b2719e3f3a28a52c
BLAKE2b-256 1ca78ad215e86c3d70ef668616b897efb17ca5c0739feb625d962bf24ae3bac7

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.5-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: charstreamer-0.1.5-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for charstreamer-0.1.5-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 802d6d4c63ab3f294800816a4c27ad425d863991b11e81eb71731ac129dede20
MD5 8a5141a93cfce5ce7f0b2656ecaf7e12
BLAKE2b-256 76e63fdc20fb12b7b4e7c59e8a373f95c8c197ad1fd560da9c45ad6e1dfe394d

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.5-cp39-abi3-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for charstreamer-0.1.5-cp39-abi3-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 9ad6a5e60d7d12b631219c89114c340451abee80636c3e9e52b380c188bce871
MD5 d4cc39c9bb6ea87bead090884d2c8cc2
BLAKE2b-256 26e2541931f584e065ddea9083946f6584922a1821b7ed6abfbe6b4026f9291d

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.5-cp39-abi3-manylinux_2_38_aarch64.whl.

File metadata

File hashes

Hashes for charstreamer-0.1.5-cp39-abi3-manylinux_2_38_aarch64.whl
Algorithm Hash digest
SHA256 23102e83c4d5edc8c3dece76287959d2daffcd6b8e1dc597016a2c64bcdd38e1
MD5 f2ff0c07f1d378ef1b5f71f413258e28
BLAKE2b-256 d8f9c071d0c27c6234625846ed4318eada08c10c54aca383c5733f2fadd3ad67

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.5-cp39-abi3-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for charstreamer-0.1.5-cp39-abi3-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 0be5661f7fdbedbc42b7f79a22ed5130a48e6bbcd4315169359e4d8a8c7b2249
MD5 d8256a79bb854b4bf4babd28be5026c8
BLAKE2b-256 fac033388961a300e31110206cd6ec2aabdec2a43637e99abc926a8ed13294e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page