Skip to main content

A library for uncertainty quantification in machine learning

Project description

incerto

Tests Python 3.10+ License: MIT codecov Code style: black

incerto is a comprehensive Python library for uncertainty quantification in machine learning. It provides state-of-the-art methods for calibration, out-of-distribution detection, conformal prediction, selective prediction, and uncertainty estimation in deep learning and LLMs.

Latin incerto = "uncertain, doubtful, unsure".

[!WARNING] This is a v0.1 alpha release. The API may change without notice before v1.0. Tested with PyTorch ≥ 2.0, NumPy ≥ 1.24, scikit-learn ≥ 1.3, scipy ≥ 1.11. Please report any issues on GitHub.

🎯 Key Features

incerto provides a unified interface for:

Calibration

  • Post-hoc calibration: Temperature scaling, Platt scaling, isotonic regression, histogram binning
  • Training-time methods: Label smoothing, focal loss, confidence penalty, evidential deep learning
  • Metrics: ECE, MCE, Brier score, NLL, reliability diagrams

Out-of-Distribution (OOD) Detection

  • Score-based methods: MSP, MaxLogit, Energy, ODIN
  • Distance-based methods: Mahalanobis distance, KNN
  • Training methods: Mixup, CutMix, Outlier Exposure, Energy regularization

Conformal Prediction

  • Classification: Inductive CP, APS, RAPS, Mondrian CP
  • Regression: Jackknife+, CV+
  • Distribution-free uncertainty quantification with coverage guarantees

Selective Prediction

  • Confidence thresholding (Softmax Threshold)
  • Self-Adaptive Training (SAT)
  • Deep Gambler, SelectiveNet
  • Risk-coverage tradeoffs

Bayesian Deep Learning

  • MC Dropout: Uncertainty via dropout at test time
  • Deep Ensembles: Train multiple models for robust predictions
  • SWAG: Stochastic Weight Averaging - Gaussian
  • Laplace Approximation: Gaussian posterior around MAP estimate
  • Variational Inference: Bayes by Backprop
  • Uncertainty decomposition: Separate epistemic & aleatoric uncertainty

Distribution Shift Detection

  • Statistical tests: MMD, Energy distance, Kolmogorov-Smirnov
  • Classifier-based: Black-Box Shift Detection (BBSD)
  • Label shift: Detect and correct label distribution changes
  • Importance weighting: Covariate shift adaptation

LLM Uncertainty

  • Token-level: Entropy, confidence, perplexity, surprisal
  • Sequence-level: Sequence probability, average log-prob
  • Sampling-based: Self-consistency, semantic entropy, predictive entropy
  • Generation methods: Beam search uncertainty, nucleus sampling, contrastive decoding

Active Learning

  • Acquisition functions: Entropy, BALD, margin, variance ratio
  • Query strategies: Uncertainty sampling, diversity sampling, Core-Set, BADGE
  • Batch selection: BatchBALD for efficient batch queries
  • Committee methods: Query by Committee (QBC)

Data & Utilities

  • Built-in datasets (MNIST, CIFAR-10/100, SVHN)
  • OOD benchmark datasets
  • Visualization utilities
  • Common architectures (ConvNet, ResNet)

🚀 Installation

From PyPI

pip install incerto

With optional extras:

pip install incerto[vision]   # + torchvision for vision datasets
pip install incerto[llm]      # + transformers, accelerate, sentence-transformers
pip install incerto[all]      # all optional dependencies

From source

git clone https://github.com/steverab/incerto.git
cd incerto
pip install -e .

📖 Quick Start

Calibration

import torch
from torch.utils.data import DataLoader
from incerto.calibration import TemperatureScaling, ece_score

# Assume you have a trained model
model = ...  # Your trained classifier
model.eval()

# Collect validation predictions for calibration
val_logits, val_labels = [], []
with torch.no_grad():
    for x, y in val_loader:
        logits = model(x)
        val_logits.append(logits)
        val_labels.append(y)

val_logits = torch.cat(val_logits)
val_labels = torch.cat(val_labels)

# Fit temperature scaling on validation set
calibrator = TemperatureScaling()
calibrator.fit(val_logits, val_labels)
print(f"Learned temperature: {calibrator.temperature.item():.4f}")

# Apply calibration to test set
test_logits, test_labels = [], []
with torch.no_grad():
    for x, y in test_loader:
        logits = model(x)
        test_logits.append(logits)
        test_labels.append(y)

test_logits = torch.cat(test_logits)
test_labels = torch.cat(test_labels)

# Get calibrated logits
calibrated_logits = calibrator(test_logits)  # Applies temperature scaling

# Measure calibration improvement
ece_before = ece_score(test_logits, test_labels, n_bins=15)
ece_after = ece_score(calibrated_logits, test_labels, n_bins=15)
print(f"ECE before: {ece_before:.4f} | ECE after: {ece_after:.4f}")

OOD Detection

import torch
from torch.utils.data import DataLoader
from incerto.ood import Energy, auroc

# Load in-distribution and OOD datasets
id_loader = DataLoader(cifar10_test, batch_size=128)
ood_loader = DataLoader(svhn_test, batch_size=128)

# Create Energy-based OOD detector
detector = Energy(model, temperature=1.0)

# Compute scores (higher = more OOD)
id_scores = torch.cat([detector.score(x) for x, _ in id_loader])
ood_scores = torch.cat([detector.score(x) for x, _ in ood_loader])

# Evaluate detection performance — auroc takes the two score tensors directly
auc = auroc(id_scores, ood_scores)
print(f"OOD Detection AUROC: {auc:.4f}")

# Use detector with threshold
test_batch = next(iter(id_loader))[0]
predictions = detector.predict(test_batch, threshold=-10.0)
print(f"Detected {predictions.sum()} OOD samples")

Conformal Prediction

import torch
from torch.utils.data import DataLoader
from incerto.conformal import aps

# Calibrate conformal predictor (typically on held-out calibration set)
alpha = 0.1  # Miscoverage rate (1 - alpha = 90% coverage)
predictor = aps(model, calib_loader, alpha=alpha)

# Generate prediction sets on test data
prediction_sets = []
for x, y in test_loader:
    sets = predictor(x)  # List of sets, one per sample
    prediction_sets.extend(sets)

# Compute coverage and average set size
coverage = sum(y_true in pred_set
               for y_true, pred_set in zip(test_labels, prediction_sets))
coverage /= len(test_labels)

avg_size = sum(len(s) for s in prediction_sets) / len(prediction_sets)
print(f"Empirical coverage: {coverage:.3f} (target: {1-alpha:.3f})")
print(f"Average set size: {avg_size:.2f}")

Selective Prediction

import torch
from incerto.sp import SoftmaxThreshold

# Create selective predictor (wraps your trained model)
selector = SoftmaxThreshold(model)
selector.eval()

# Get logits and confidence scores for test data
all_logits, all_confidences = [], []
with torch.no_grad():
    for x, y in test_loader:
        logits, conf = selector(x, return_confidence=True)
        all_logits.append(logits)
        all_confidences.append(conf)

all_logits = torch.cat(all_logits)
all_confidences = torch.cat(all_confidences)
predictions = all_logits.argmax(dim=-1)

# Set confidence threshold (e.g., top 80% most confident)
threshold = all_confidences.quantile(0.2)  # Reject bottom 20%

# Evaluate selective accuracy
selected_mask = all_confidences >= threshold
selected_acc = (predictions[selected_mask] == test_labels[selected_mask]).float().mean()
coverage = selected_mask.float().mean()

print(f"Confidence threshold: {threshold:.4f}")
print(f"Coverage: {coverage:.2%}")
print(f"Selective accuracy: {selected_acc:.4f}")

# Reject high-uncertainty samples
rejected = selector.reject(all_confidences, threshold)
print(f"Rejected samples: {rejected.sum()}/{len(predictions)}")

Bayesian Neural Networks

import torch
from incerto.bayesian import VariationalBayesNN

# Create Variational Bayesian NN
# Specify architecture: input_dim, [hidden_sizes], output_dim
vbnn = VariationalBayesNN(
    in_features=784,
    hidden_sizes=[512, 256],
    out_features=10,
    prior_std=1.0
)

# Train with variational loss (likelihood + KL divergence)
optimizer = torch.optim.Adam(vbnn.parameters(), lr=0.001)

for epoch in range(10):
    vbnn.train()
    for batch_x, batch_y in train_loader:
        optimizer.zero_grad()
        # Variational loss with Monte Carlo sampling
        loss = vbnn.variational_loss(batch_x, batch_y, num_samples=10)
        loss.backward()
        optimizer.step()

# Get predictions with variance estimates
vbnn.eval()
with torch.no_grad():
    mean_pred, variance = vbnn.predict(test_x)

print(f"Average predictive variance: {variance.mean():.4f}")

# Identify high-uncertainty samples
high_unc_mask = variance > variance.quantile(0.9)
print(f"High uncertainty samples: {high_unc_mask.sum()}/{len(test_x)}")

Distribution Shift Detection

import torch
from torch.utils.data import DataLoader
from incerto.shift import MMDShiftDetector

# Load reference (training) data
reference_loader = DataLoader(train_dataset, batch_size=128)

# Load production data (potentially shifted)
production_loader = DataLoader(production_dataset, batch_size=128)

# Create MMD shift detector with Gaussian kernel
mmd_detector = MMDShiftDetector(sigma=1.0)

# Fit on reference distribution
mmd_detector.fit(reference_loader)

# Compute shift score on production data
shift_score = mmd_detector.score(production_loader)
baseline_score = mmd_detector.score(reference_loader)  # Self-test

# Calculate shift ratio
shift_ratio = shift_score / (baseline_score + 1e-10)
print(f"MMD shift score: {shift_score:.6f}")
print(f"Shift ratio: {shift_ratio:.2f}x")

# Alert based on shift magnitude
if shift_ratio > 2.0:
    print("⚠️  CRITICAL: Significant distribution shift detected!")
    print("   Recommendation: Retrain model immediately")
elif shift_ratio > 1.5:
    print("⚠️  WARNING: Moderate shift detected")
    print("   Recommendation: Monitor closely, consider retraining")
else:
    print("✓ No significant shift detected")

LLM Uncertainty

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from sentence_transformers import SentenceTransformer
from incerto.llm import SemanticEntropy, TokenEntropy

# Load language model and embedding model
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
model.eval()

# Example prompt
prompt = "The capital of France is"
inputs = tokenizer(prompt, return_tensors="pt")

# --- Token-level uncertainty ---
with torch.no_grad():
    outputs = model(**inputs, return_dict=True)
    logits = outputs.logits

token_entropy = TokenEntropy.compute(logits)
print(f"Average token entropy: {token_entropy.mean():.4f}")

# --- Semantic Entropy: cluster semantically equivalent responses ---
num_samples = 10
responses = []
for _ in range(num_samples):
    output_ids = model.generate(
        **inputs,
        max_length=50,
        do_sample=True,
        temperature=0.8,
        top_p=0.9,
        num_return_sequences=1
    )
    response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    responses.append(response)

# Compute semantic entropy with embedding model
semantic_unc = SemanticEntropy.compute(
    responses,
    similarity_threshold=0.85,
    embedding_model=embedding_model
)

print(f"Semantic entropy: {semantic_unc['semantic_entropy']:.4f}")
print(f"Number of semantic clusters: {semantic_unc['num_clusters']}")

# High semantic entropy indicates uncertainty
if semantic_unc['semantic_entropy'] > 1.5:
    print("⚠️  High uncertainty: Model gives diverse semantic answers")
else:
    print("✓ Low uncertainty: Responses are semantically consistent")

📚 Examples

The examples/ directory contains Jupyter notebook tutorials covering all major features:

Notebook Description
01_calibration.ipynb Post-hoc and training-time calibration methods
02_ood_detection.ipynb Out-of-distribution detection techniques
03_selective_prediction.ipynb Selective classification with reject option
04_conformal_prediction.ipynb Distribution-free prediction sets
05_bayesian_uncertainty.ipynb Bayesian neural networks and uncertainty
06_active_learning.ipynb Query strategies and acquisition functions
07_shift_detection.ipynb Distribution shift detection methods
08_llm_uncertainty.ipynb LLM uncertainty quantification

🧪 Testing

incerto has comprehensive test coverage (982 tests, 100% passing):

# Run all tests
pytest

# Run specific module tests
pytest tests/test_calibration/
pytest tests/test_ood/
pytest tests/test_conformal/
pytest tests/test_shift/
pytest tests/test_bayesian/
pytest tests/test_active/

# Run with coverage
pytest --cov=incerto --cov-report=term-missing

📊 Supported Methods

Calibration Methods

Post-hoc:

  • Temperature Scaling
  • Vector Scaling
  • Matrix Scaling
  • Platt Scaling
  • Isotonic Regression
  • Histogram Binning
  • Dirichlet Calibration
  • Beta Calibration

Training-time:

  • Label Smoothing
  • Focal Loss
  • Confidence Penalty
  • Evidential Deep Learning
  • Temperature-Aware Training

Metrics:

  • Expected Calibration Error (ECE)
  • Maximum Calibration Error (MCE)
  • Classwise ECE
  • Brier Score
  • Negative Log-Likelihood (NLL)
OOD Detection Methods

Score-based:

  • Maximum Softmax Probability (MSP)
  • MaxLogit
  • Energy Score
  • ODIN

Distance-based:

  • Mahalanobis Distance
  • K-Nearest Neighbors (KNN)

Training-time:

  • Mixup
  • CutMix
  • Outlier Exposure
  • Energy Regularization
Conformal Prediction Methods

Classification:

  • Inductive Conformal Prediction (ICP)
  • Adaptive Prediction Sets (APS)
  • Regularized APS (RAPS)
  • Mondrian Conformal Prediction

Regression:

  • Jackknife+
  • CV+
  • Conformalized Quantile Regression
LLM Uncertainty Methods

Token-level:

  • Token Entropy
  • Token Confidence
  • Perplexity
  • Surprisal Score
  • Top-K Confidence

Sequence-level:

  • Sequence Probability
  • Average Log-Probability
  • Sequence Entropy

Sampling-based:

  • Self-Consistency
  • Semantic Entropy
  • Predictive Entropy
  • Mutual Information

Generation:

  • Beam Search Uncertainty
  • Nucleus Sampling Uncertainty
  • I Don't Know Detection
  • Contrastive Decoding
Selective Prediction Methods
  • Softmax Threshold (confidence thresholding)
  • Deep Gambler
  • SelectiveNet
  • Self-Adaptive Training (SAT)
Bayesian Methods
  • MC Dropout
  • Deep Ensembles
  • SWAG (Stochastic Weight Averaging - Gaussian)
  • Laplace Approximation
  • Variational Bayes (Bayes by Backprop)
Shift Detection Methods

Statistical:

  • MMD (Maximum Mean Discrepancy)
  • Energy Distance
  • Kolmogorov-Smirnov Test

Classifier-based:

  • Black-Box Shift Detection (BBSD)
  • Label Shift Detection
  • Importance Weighting
Active Learning Methods

Acquisition Functions:

  • Entropy Sampling
  • BALD (Bayesian Active Learning by Disagreement)
  • Least Confidence
  • Margin Sampling
  • Variance Ratio
  • Mean STD
  • BatchBALD

Query Strategies:

  • Uncertainty Sampling
  • Diversity Sampling
  • Core-Set Selection
  • BADGE
  • Query by Committee

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📖 Citation

If you use incerto in your research, please cite:

@software{incerto2025,
  author = {Rabanser, Stephan},
  title = {incerto: Uncertainty Quantification for Machine Learning},
  year = {2025},
  url = {https://github.com/steverab/incerto},
  version = {0.1.0}
}

🔗 Links


Status: Active development | Version: 0.1.0 | Python: 3.10+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

incerto-0.1.1.tar.gz (123.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

incerto-0.1.1-py3-none-any.whl (131.6 kB view details)

Uploaded Python 3

File details

Details for the file incerto-0.1.1.tar.gz.

File metadata

  • Download URL: incerto-0.1.1.tar.gz
  • Upload date:
  • Size: 123.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for incerto-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f73a636979a35e717e03d85e119cd26968b71a8ae2d45f3a69e1f08cbd93b2cd
MD5 ff743339c1aa2bee98c77d53e6b3ff85
BLAKE2b-256 565e4f44770eba9dad6a1cc795f0d579045c60843730f7eb8338d3ac7cb6ba84

See more details on using hashes here.

Provenance

The following attestation bundles were made for incerto-0.1.1.tar.gz:

Publisher: publish.yml on steverab/incerto

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file incerto-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: incerto-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 131.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for incerto-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7eb5fc5503992a60639c8c4b7a0ee1214e10b3d7b2d7361e952136e9e38e6b72
MD5 444f9443bcdb7218550dc708622b83f8
BLAKE2b-256 036e7e214cc997594daa6f7ebe906239c8e20d86fbd1411a9593c0145bf900c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for incerto-0.1.1-py3-none-any.whl:

Publisher: publish.yml on steverab/incerto

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page