Skip to main content

A Python library for estimating confidence intervals around accuracy and sample sizes for classification experiments.

Project description

confidence-planner

License: MIT Python 3 Tests last commit Discuss

The confidence-planner package provides implementations of estimation procedures for confidence intervals around classification accuracy in Python. The package currently features approximations for holdout, bootstrap, cross-validation, and progressive validation experiments. For information on how to install use the package, read on or take a look at our demonstration video below. To experiment with different estimation procedures go to the accompanying web application at https://prediction-confidence-planner.herokuapp.com/.

Installing confidence-planner

To install confidence-planner, just execute:

pip install confidence-planner

Afterwards you can import confidence_planner and use all its functions.

Quickstart

from sklearn import datasets, svm, metrics
from sklearn.model_selection import train_test_split
import confidence_planner as cp

# example dataset
X, y = datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, stratify=y, random_state=23
)

# training the classifier and calculating accuracy
clf = svm.SVC(gamma=0.001)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
acc = metrics.accuracy_score(y_test, y_pred)

# confidence interval and sample size estimation
ci = cp.estimate_confidence_interval(y_test.shape[0], acc, confidence_level=0.90)
sample = cp.estimate_sample_size(interval_radius=0.05, confidence_level=0.90)
print(f"90% CI: {ci}")
print(f"Samples needed for a 0.05 radius 90% CI: {sample}")

More code examples (including cross-validation and bootstrapping) can be found in the examples folder.

References

Confidence-planner methods belong to the field of frequentist statistics.

[1] Langford, J.: Tutorial on practical prediction theory for classification. Journal of Machine Learnining Research 6, 273–306 (2005).

[2] Blum, A., Kalai, A., Langford, J.: Beating the hold-out: Bounds for k-fold and progressive cross-validation. Proceedings of the Twelfth Annual Conference on Computational Learning Theory, COLT (1999).

[3] Puth, M.T., Neuhauser, M., Ruxton, G.: On the variety of methods for calculating confidence intervals by bootstrapping. The Journal of animal ecology 84 (2015).

License

Confidence-planner is free and open-source software licensed under the MIT license.

Contact

The best way to ask questions is via the GitHub Discussions channel. In case you encounter usage bugs, please don't hesitate to use the GitHub's issue tracker directly.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

confidence-planner-0.1.3.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

confidence_planner-0.1.3-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file confidence-planner-0.1.3.tar.gz.

File metadata

  • Download URL: confidence-planner-0.1.3.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.7

File hashes

Hashes for confidence-planner-0.1.3.tar.gz
Algorithm Hash digest
SHA256 c37b9eb9d5c688e6615fa59a1d663b206ba969ce170e2674202f542fc7533d46
MD5 bc5f5b693db378ecb6ceda87d10cb181
BLAKE2b-256 0f05d77fcaf1b6b2f03f0982b65c4a5e1593dd942c26134c983eb974faee2cea

See more details on using hashes here.

File details

Details for the file confidence_planner-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for confidence_planner-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7820c9d6e97e467611e41529a479630742b7d38f691f4aec9531a549236a71ca
MD5 06ff34ab5b5ff11205ff17077b3f3c56
BLAKE2b-256 edba303c8c9e5b3c7598e77ef2d0434664347b45b6eb05f3aaac299c23ef8702

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page