Skip to main content

Fast Rust/PyO3 semantic text segmentation

Project description

CharStreamer Python

charstreamer provides Python access to the Rust CharStreamer segmentation engine through a PyO3 extension module.

This package exposes the Rust model artifact loader and model-backed segmentation runtime. If no supported model is available, annotation fails instead of synthesizing semantic labels from hard-coded rules.

The vendored 0.1.4 bundle emits model-backed sentence, paragraph, metadata, section, and list_item spans. dialogue remains reserved until there is a balanced dialogue training set.

Install

pip install charstreamer

Example

import charstreamer

text = """# Background
The court reviewed the invoice. The shipment was late. Notice was timely."""

segmenter = charstreamer.Segmenter.default()
print(segmenter.model_info().runtime)
annotation = segmenter.annotate(text)

print(annotation.spans)
print(annotation.tagged)

The public Python wrapper returns typed immutable dataclasses: ModelInfo, Annotation, Span, and BenchmarkResult. For JSON output or legacy integrations, call .to_dict() or use methods such as segmenter.annotate_dict(text).

Performance

On the current Linux x86_64 release-wheel benchmark, the combined sentence+semantic segmenter runs at roughly 34-35 MiB/s end-to-end on a long UTF-8 document. This includes model inference, span decoding, and tagged rendering for the default Burn model bundle.

Measure local throughput with:

result = segmenter.benchmark(text, iterations=10)
print(result.mib_per_second)
print(result.chars_per_second)

If a default model is vendored into the wheel, Segmenter.default() loads it from package data. If not, it checks the local cache and then the GitHub release model URL unless CHARSTREAMER_AUTO_DOWNLOAD=0 is set. To assert model availability during startup:

charstreamer.model_info(allow_download=False, require_model=True)
segmenter = charstreamer.Segmenter.default(require_model=True)

Model-backed release wheels must include charstreamer/models/default/manifest.json plus the referenced Burn payload.

The vendored 0.1.4 bundle combines a sentence-boundary model with a semantic structure model. It is an early model-backed release, not a final semantic span/IOB model, and quality should be evaluated against task-specific data before production use.

The project is an early development release. APIs may change before a stable 1.0 release.

Full documentation and Rust source are available at:

https://github.com/mjbommar/charstreamer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

charstreamer-0.1.6.tar.gz (1.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

charstreamer-0.1.6-cp39-abi3-win_arm64.whl (2.5 MB view details)

Uploaded CPython 3.9+Windows ARM64

charstreamer-0.1.6-cp39-abi3-win_amd64.whl (3.0 MB view details)

Uploaded CPython 3.9+Windows x86-64

charstreamer-0.1.6-cp39-abi3-manylinux_2_38_x86_64.whl (15.3 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.38+ x86-64

charstreamer-0.1.6-cp39-abi3-manylinux_2_38_aarch64.whl (9.7 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.38+ ARM64

charstreamer-0.1.6-cp39-abi3-macosx_15_0_arm64.whl (12.2 MB view details)

Uploaded CPython 3.9+macOS 15.0+ ARM64

File details

Details for the file charstreamer-0.1.6.tar.gz.

File metadata

  • Download URL: charstreamer-0.1.6.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for charstreamer-0.1.6.tar.gz
Algorithm Hash digest
SHA256 0b466eeb69f9e2c1ae686cc19ef4a414d0617ab116f14462f3bffbdafad45092
MD5 ad10e7306377e89ed48d2d0c2fd1a63c
BLAKE2b-256 88a817796616dcb73c37f570bdbeb5350ef9e2070a1234ced9504b0e42436720

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.6-cp39-abi3-win_arm64.whl.

File metadata

  • Download URL: charstreamer-0.1.6-cp39-abi3-win_arm64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: CPython 3.9+, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for charstreamer-0.1.6-cp39-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 40765f375d7bd9980776eb51e6e12191346b55adb23621d1e46c128ade3cab48
MD5 b9547d671b99eb2a223e9037150d5d76
BLAKE2b-256 fe79f612cbd82e434efc59bdb8fa382603f2ddf8a4e4a9052e75f1b2dd2d4372

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.6-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: charstreamer-0.1.6-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for charstreamer-0.1.6-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 9879343004182b21a82544d8064bf1e46ccfe65d8f87530ae82488f6bd3d4a71
MD5 ea60548a59c74c721e07ac4e0fb9f9e8
BLAKE2b-256 0c646001557192a05b5de6f94796f2613698e40dfccc0143ff38d9c6e7c786c2

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.6-cp39-abi3-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for charstreamer-0.1.6-cp39-abi3-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 6e16458f5c750304db60ee80c7421d8c0004bad3096145fc0fccae4c2fea778d
MD5 80e78176083bcd7b9df8e03e784e0edb
BLAKE2b-256 b025ac86ca10612758b1622e2521cb7e58c360e8d9d16d7b6cdcbc71c40ecd70

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.6-cp39-abi3-manylinux_2_38_aarch64.whl.

File metadata

File hashes

Hashes for charstreamer-0.1.6-cp39-abi3-manylinux_2_38_aarch64.whl
Algorithm Hash digest
SHA256 51a5476dcd4517a85eb93d67dadb5bd1a97701ef4313a83d242f067c934fe812
MD5 fbb6a0a1024440eaae61587c25c4417f
BLAKE2b-256 f70792fd9b9a466fc3d02c031ba5fb34962b70d1c3c44b6bad6c4bab3164ebcb

See more details on using hashes here.

File details

Details for the file charstreamer-0.1.6-cp39-abi3-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for charstreamer-0.1.6-cp39-abi3-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 b378045ba82cdbbc33e716737fc4f023dbd3ca647001d6ea54e63f2337742996
MD5 73937244d8cd686ca9c1d3a5e321c261
BLAKE2b-256 d28393135d109e52f8d3c924342631a9d8126c0fef566a597e440eea6ab49d9b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page