Skip to main content

A Python library for estimating confidence intervals around accuracy and sample sizes for classification experiments.

Project description

confidence-planner

License: MIT Python 3 Tests last commit Discuss

The confidence-planner package provides implementations of estimation procedures for confidence intervals around classification accuracy in Python. The package currently features approximations for holdout, bootstrap, cross-validation, and progressive validation experiments. For information on how to install use the package, read on or take a look at our demonstration video below. To experiment with different estimation procedures go to the accompanying web application at https://prediction-confidence-planner.herokuapp.com/.

Installing confidence-planner

To install confidence-planner, just execute:

pip install confidence-planner

Afterwards you can import confidence_planner and use all its functions.

Quickstart

from sklearn import datasets, svm, metrics
from sklearn.model_selection import train_test_split
import confidence_planner as cp

# example dataset
X, y = datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, stratify=y, random_state=23
)

# training the classifier and calculating accuracy
clf = svm.SVC(gamma=0.001)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
acc = metrics.accuracy_score(y_test, y_pred)

# confidence interval and sample size estimation
ci = cp.estimate_confidence_interval(y_test.shape[0], acc, confidence_level=0.90)
sample = cp.estimate_sample_size(interval_radius=0.05, confidence_level=0.90)
print(f"90% CI: {ci}")
print(f"Samples needed for a 0.05 radius 90% CI: {sample}")

More code examples (including cross-validation and bootstrapping) can be found in the examples folder.

References

Confidence-planner methods belong to the field of frequentist statistics.

[1] Langford, J.: Tutorial on practical prediction theory for classification. Journal of Machine Learnining Research 6, 273–306 (2005).

[2] Blum, A., Kalai, A., Langford, J.: Beating the hold-out: Bounds for k-fold and progressive cross-validation. Proceedings of the Twelfth Annual Conference on Computational Learning Theory, COLT (1999).

[3] Puth, M.T., Neuhauser, M., Ruxton, G.: On the variety of methods for calculating confidence intervals by bootstrapping. The Journal of animal ecology 84 (2015).

License

Confidence-planner is free and open-source software licensed under the MIT license.

Contact

The best way to ask questions is via the GitHub Discussions channel. In case you encounter usage bugs, please don't hesitate to use the GitHub's issue tracker directly.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

confidence-planner-0.1.3.tar.gz (9.4 kB view hashes)

Uploaded Source

Built Distribution

confidence_planner-0.1.3-py3-none-any.whl (8.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page