Skip to main content

A lightweight package for computing confidence intervals for classification tasks using conformal prediction and Pearson residuals.

Project description

💡 Pearsonify

Probabilistic Classification with Conformalized Intervals

Pearsonify is a lightweight 🐍 Python package for generating classification intervals around predicted probabilities in binary classification tasks.

It uses Pearson residuals and principles of conformal prediction to quantify uncertainty without making strong distributional assumptions.

Image

🚀 Why Pearsonify?

  • 📊 Intuitive Classification Intervals: Get reliable intervals for binary classification predictions.
  • 🧠 Statistically Grounded: Uses Pearson residuals, a well-established metric from classical statistics.
  • Model-Agnostic: Works with any model that provides probability estimates.
  • 🛠️ Lightweight: Minimal dependencies, easy to integrate into existing projects.

📦 How to install?

Use pip to install the package from GitHub:

pip install pearsonify
# or from GitHub:
pip install git+https://github.com/xRiskLab/pearsonify.git

💻 How to use?

import numpy as np
from pearsonify import Pearsonify
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Generate synthetic classification data
np.random.seed(42)
X, y = make_classification(
    n_samples=1000, n_features=20, n_informative=10, n_classes=2, random_state=42
)

# Split data into train, calibration, and test sets
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, random_state=42)
X_cal, X_test, y_cal, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

# Initialize Pearsonify with an SVC model
clf = SVC(probability=True, random_state=42)
model = Pearsonify(estimator=clf, alpha=0.05)

# Fit the model on training and calibration sets
model.fit(X_train, y_train, X_cal, y_cal)

# Generate prediction intervals for test set
y_test_pred_proba, lower_bounds, upper_bounds = model.predict_intervals(X_test)

# Calculate coverage
coverage = model.evaluate_coverage(y_test, lower_bounds, upper_bounds)
print(f"Coverage: {coverage:.2%}")

# Plot the intervals
model.plot_intervals(y_test_pred_proba, lower_bounds, upper_bounds)

Running example.py will generate the following plot:

Image

This plot shows predicted probabilities with 95% confidence intervals, sorted by prediction score.

📖 References

Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression. John Wiley & Sons.

Tibshirani, R. (2023). Conformal Prediction. Advanced Topics in Statistical Learning, Spring 2023.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pearsonify-0.1.0.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pearsonify-0.1.0-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file pearsonify-0.1.0.tar.gz.

File metadata

  • Download URL: pearsonify-0.1.0.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pearsonify-0.1.0.tar.gz
Algorithm Hash digest
SHA256 21ddb3ed0c7bad050ee5f03a2c467441328feae742d1ca45e7dc6297bcaa55d7
MD5 04d82dbfa1925dbb3618535902a7395a
BLAKE2b-256 d42e02de300756b98ed92657e433552e7c25b24785d5362c55717a53d96ed2c8

See more details on using hashes here.

File details

Details for the file pearsonify-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pearsonify-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pearsonify-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 af714d2d4d8b22fc503a622a0a1a1b192e5ea725b5662f27b51d02d0fcecf61a
MD5 e162876feb4bf0e49afda4440eb1749d
BLAKE2b-256 25cc4c6d775694d8a138988e8c8c6f7e01890ffaa710befc41484e0d0781c062

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page