Skip to main content

A wrapper for easy plots of learning and validation curves

Project description

sk-modelcurves

A Python wrapper built for software engineers and researchers to facilitate easy creation of learning and validation curve plots from scikit-learn.

The module is meant to complement your workflow in scikit-learn and ease the process of evaluating your models.

The module includes many quality of life features that should save you precious time whenever you want to plot a learning curve to check for bias/variance or plot a validation curve to see the effect of tuning a hyperparameter.

Background

For those not familiar with learning curves, check out Andrew Ng’s excellent discussion of their use at http://cs229.stanford.edu/materials/ML-advice.pdf

Over the process of writing many research papers and building many models, I found myself using boilerplate code that I would copy paste for almost every project whenever I wanted to plot a learning curve or validation curve to evaluate models.

Hopefully, this module will save you a few minutes each time you need to plot a learning or validation curve so you can focus on other things.

Install

Python’s pip is the recommended method of installation. From the terminal:

$ pip install sk_modelcurves

Example Usage

Generate a learning curve using accuracy as a metric and 5-fold cross validation.

Assumes a sklearn estimator called knn, training data matrix called X and training labels called y:

$ from sk_modelcurves.learning_curve import draw_learning_curve
$ draw_learning_curve(knn, X, y, scoring='accuracy', cv=5)
$ plt.show()

Generate multiple learning curves for several estimators using F1 score as a metric, 5-fold cross validation, and names for each of the estimators.

Assumes 3 sklearn estimators called knn2, knn20, knn40, training data matrix called X and training labels called y:

$ from sk_modelcurves.learning_curve import draw_learning_curve
$ draw_learning_curve([knn2, knn20, knn40], X, y, scoring='f1', cv=5,
  estimator_titles=['2 Neighbors', '20 Neighbors', '40 Neighbors'])
$ plt.show()

Many other options are available. Check out the source code docstrings or the upcoming documentation.

Dependencies

sk-modelcurves is tested to work for Python 2.6 and Python 2.7. Python 3.3+ has not been tested and is assumed to not work until tested.

The required dependencies include scikit-learn (of course!), numpy >= 1.6.1, and matplotlib >= 1.1.1.

To run tests, you will need nose >= 1.1.2.

Contributing

Anyone is welcome!

If you find a bug or would like to discuss a potential feature, please file an issue first.

Testing

After installation, you can launch the test suite from outside the source directory (you will need to have the nose package installed):

$ nosetests -v sk_modelcurves

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sk_modelcurves-0.4.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

sk_modelcurves-0.4-py2-none-any.whl (7.9 kB view details)

Uploaded Python 2

File details

Details for the file sk_modelcurves-0.4.tar.gz.

File metadata

  • Download URL: sk_modelcurves-0.4.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for sk_modelcurves-0.4.tar.gz
Algorithm Hash digest
SHA256 13d1bb6322e6b626fd22ca1ec96701ee786ae9bdcd0bd4b2a426fc57bad4ab1e
MD5 c9788efa2168e75c1162e7ddeb825fc8
BLAKE2b-256 b7bc9b1b2ce082ee62263f057599bba6a368b1c2379040c3bf7cf4cc1934a043

See more details on using hashes here.

File details

Details for the file sk_modelcurves-0.4-py2-none-any.whl.

File metadata

File hashes

Hashes for sk_modelcurves-0.4-py2-none-any.whl
Algorithm Hash digest
SHA256 72826de4b6b9ef969cc5040f4e54270dc9b5db43ddb14ee628b2dcc087d66c23
MD5 23a4c5a6e8daf91b3d24e80675b7e574
BLAKE2b-256 5e932853444dcc4cb7bb52c08c45ff743e3dab8ae9b5f01d02bcf65b8c9d552e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page