Suffix smoothing classifier: three research-backed methods, conformal prediction, streaming.
Project description
suffix-smoother
A lightweight, production-ready sequence classifier using recursive suffix smoothing.
Zero neural networks. Zero model files. Zero corpus downloads. Handles any unseen input via progressive backoff.
What's New in v0.2.0
- 3 Research-Backed Smoothing Methods: Jelinek-Mercer, Witten-Bell, and Kneser-Ney.
- Conformal Prediction:
calibrate()+predict_set()providing mathematical coverage guarantees. - Streaming Training:
train_one()for real-time adaptation without full retraining. - Optimized Core: Vectorized NumPy-based inference (6,000+ queries/sec).
- Improved Calibration: Low ECE (Expected Calibration Error) on sparse datasets.
Install
pip install suffix-smoother
Quick Start
from suffix_smoother import SuffixSmoother, SuffixConfig
# Configure with Witten-Bell (default) for robust calibration
config = SuffixConfig(max_suffix_length=5, n_classes=2, smoothing_method="witten-bell")
smoother = SuffixSmoother(config)
# Training: (context_tuple, label_id) pairs
smoother.train([
((101, 102, 103), 0),
((404, 404, 500), 1),
])
# Predict with confidence
label, confidence = smoother.predict((101, 102, 103))
# Statistical Coverage Guarantee via Conformal Prediction
smoother.calibrate(validation_data) # list of (seq, true_label)
result = smoother.predict_set((101, 102), coverage=0.90)
print(result['labels']) # Minimal set guaranteed to contain true label >= 90% of time
API Reference
SuffixConfig
| Parameter | Default | Description |
|---|---|---|
max_suffix_length |
5 |
Maximum context length |
smoothing_method |
"witten-bell" |
"jelinek-mercer", "witten-bell", or "kneser-ney" |
n_classes |
16 |
Number of output labels |
label_smoothing |
0.0 |
ε fraction redistributed across classes |
SuffixSmoother
| Method | Description |
|---|---|
train(data) |
Batch training on (seq, label) pairs |
train_one(seq, label) |
Online/streaming update |
predict(seq) |
Returns (label_id, confidence) |
predict_set(seq, coverage) |
Returns conformal prediction set |
calibrate(data) |
Calibrates conformal predictor |
uncertainty(seq) |
Shannon entropy in bits |
Performance
- Inference: < 0.15ms per sequence
- Training: > 20,000 samples/second
- Dependencies:
numpyonly
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
suffix_smoother-0.2.1.tar.gz
(12.4 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file suffix_smoother-0.2.1.tar.gz.
File metadata
- Download URL: suffix_smoother-0.2.1.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8828925f09f123e26c40e6d3174d820023f0752682403fa6f3555eb01ae9d228
|
|
| MD5 |
1a6f3b161ee58d6fb29014707053b54a
|
|
| BLAKE2b-256 |
b15e6eaf6dfd891d4874e8215d916581a3e1e503a93b4d7e372558a4b01f25d8
|
File details
Details for the file suffix_smoother-0.2.1-py3-none-any.whl.
File metadata
- Download URL: suffix_smoother-0.2.1-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a65820b3e45d876bf047d10c868d2d888e122c6cb46a1fd9bde609cdbb214f3c
|
|
| MD5 |
b55fd41262d7b204980e6d53b035f48b
|
|
| BLAKE2b-256 |
39d36b5d900991d2d64a1f828c51d51460196aa3dd72c18c14ec65c3db8104b7
|