A wrapper for easy plots of learning and validation curves
Project description
sk-modelcurves
A Python wrapper built for software engineers and researchers to facilitate easy creation of learning and validation curve plots from scikit-learn.
The module is meant to complement your workflow in scikit-learn and ease the process of evaluating your models.
The module includes many quality of life features that should save you precious time whenever you want to plot a learning curve to check for bias/variance or plot a validation curve to see the effect of tuning a hyperparameter.
Background
For those not familiar with learning curves, check out Andrew Ng’s excellent discussion of their use at http://cs229.stanford.edu/materials/ML-advice.pdf
Over the process of writing many research papers and building many models, I found myself using boilerplate code that I would copy paste for almost every project whenever I wanted to plot a learning curve or validation curve to evaluate models.
Hopefully, this module will save you a few minutes each time you need to plot a learning or validation curve so you can focus on other things.
Install
Python’s pip is the recommended method of installation. From the terminal:
$ pip install sk_modelcurves
Example Usage
Generate a learning curve using accuracy as a metric and 5-fold cross validation.
Assumes a sklearn estimator called knn, training data matrix called X and training labels called y:
$ from sk_modelcurves.learning_curve import draw_learning_curve $ draw_learning_curve(knn, X, y, scoring='accuracy', cv=5) $ plt.show()
Generate multiple learning curves for several estimators using F1 score as a metric, 5-fold cross validation, and names for each of the estimators.
Assumes 3 sklearn estimators called knn2, knn20, knn40, training data matrix called X and training labels called y:
$ from sk_modelcurves.learning_curve import draw_learning_curve $ draw_learning_curve([knn2, knn20, knn40], X, y, scoring='f1', cv=5, estimator_titles=['2 Neighbors', '20 Neighbors', '40 Neighbors']) $ plt.show()
Many other options are available. Check out the source code docstrings or the upcoming documentation.
Important Links
Official source code repo: https://github.com/MasonGallo/sk-modelcurve
HTML documentation: coming soon!
Issue tracker: https://github.com/MasonGallo/sk-modelcurve/issues
Dependencies
sk-modelcurves is tested to work for Python 2.6 and Python 2.7. Python 3.3+ has not been tested and is assumed to not work until tested.
The required dependencies include scikit-learn (of course!), numpy >= 1.6.1, and matplotlib >= 1.1.1.
To run tests, you will need nose >= 1.1.2.
Contributing
Anyone is welcome!
If you find a bug or would like to discuss a potential feature, please file an issue first.
Testing
After installation, you can launch the test suite from outside the source directory (you will need to have the nose package installed):
$ nosetests -v sk_modelcurves
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sk_modelcurves-0.4-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72826de4b6b9ef969cc5040f4e54270dc9b5db43ddb14ee628b2dcc087d66c23 |
|
MD5 | 23a4c5a6e8daf91b3d24e80675b7e574 |
|
BLAKE2b-256 | 5e932853444dcc4cb7bb52c08c45ff743e3dab8ae9b5f01d02bcf65b8c9d552e |