Skip to main content

Suffix smoothing classifier: three research-backed methods, conformal prediction, streaming.

Project description

suffix-smoother

A lightweight, production-ready sequence classifier using recursive suffix smoothing.

Zero neural networks. Zero model files. Zero corpus downloads. Handles any unseen input via progressive backoff.


What's New in v0.2.0

  • 3 Research-Backed Smoothing Methods: Jelinek-Mercer, Witten-Bell, and Kneser-Ney.
  • Conformal Prediction: calibrate() + predict_set() providing mathematical coverage guarantees.
  • Streaming Training: train_one() for real-time adaptation without full retraining.
  • Optimized Core: Vectorized NumPy-based inference (6,000+ queries/sec).
  • Improved Calibration: Low ECE (Expected Calibration Error) on sparse datasets.

Install

pip install suffix-smoother

Quick Start

from suffix_smoother import SuffixSmoother, SuffixConfig

# Configure with Witten-Bell (default) for robust calibration
config = SuffixConfig(max_suffix_length=5, n_classes=2, smoothing_method="witten-bell")
smoother = SuffixSmoother(config)

# Training: (context_tuple, label_id) pairs
smoother.train([
    ((101, 102, 103), 0),
    ((404, 404, 500), 1),
])

# Predict with confidence
label, confidence = smoother.predict((101, 102, 103))

# Statistical Coverage Guarantee via Conformal Prediction
smoother.calibrate(validation_data) # list of (seq, true_label)
result = smoother.predict_set((101, 102), coverage=0.90)
print(result['labels']) # Minimal set guaranteed to contain true label >= 90% of time

API Reference

SuffixConfig

Parameter Default Description
max_suffix_length 5 Maximum context length
smoothing_method "witten-bell" "jelinek-mercer", "witten-bell", or "kneser-ney"
n_classes 16 Number of output labels
label_smoothing 0.0 ε fraction redistributed across classes

SuffixSmoother

Method Description
train(data) Batch training on (seq, label) pairs
train_one(seq, label) Online/streaming update
predict(seq) Returns (label_id, confidence)
predict_set(seq, coverage) Returns conformal prediction set
calibrate(data) Calibrates conformal predictor
uncertainty(seq) Shannon entropy in bits

Performance

  • Inference: < 0.15ms per sequence
  • Training: > 20,000 samples/second
  • Dependencies: numpy only

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

suffix_smoother-0.2.1.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

suffix_smoother-0.2.1-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file suffix_smoother-0.2.1.tar.gz.

File metadata

  • Download URL: suffix_smoother-0.2.1.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for suffix_smoother-0.2.1.tar.gz
Algorithm Hash digest
SHA256 8828925f09f123e26c40e6d3174d820023f0752682403fa6f3555eb01ae9d228
MD5 1a6f3b161ee58d6fb29014707053b54a
BLAKE2b-256 b15e6eaf6dfd891d4874e8215d916581a3e1e503a93b4d7e372558a4b01f25d8

See more details on using hashes here.

File details

Details for the file suffix_smoother-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for suffix_smoother-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a65820b3e45d876bf047d10c868d2d888e122c6cb46a1fd9bde609cdbb214f3c
MD5 b55fd41262d7b204980e6d53b035f48b
BLAKE2b-256 39d36b5d900991d2d64a1f828c51d51460196aa3dd72c18c14ec65c3db8104b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page