Helper code to make easier working with sklearn. https://github.com/aras7/scikit-learn-helper
Project description
scikit-learn-helper
============
scikit-learn-helper is a light library with the purpose of providing utility functions that makes working with scikit-learn even easier, by letting us to focus on the solving the probling instead of writting boilerplate code
### Installation
#### Dependencies
scikit-learn-helper requires:
scikit-learn (>= 0.20.2) Of course :)
#### User installation
pip install scikit-learn-helper
#### Source code
https://github.com/aras7/scikit-learn-helper
### Examples
#### How to use it?
```python
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyRegressor": {
"model": DummyRegressor()
}
}
evaluator = Evaluator(models)
dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | mean_squared_error:111.1508 |Time:0.00 sec
```
#### How to use other another metric for evaluation?
```
from sklearn_helper.model.evaluator import Evaluator
from sklearn.metrics import r2_score
models = {
"DummyRegressor": {
"model": DummyRegressor()
}
}
evaluator = Evaluator(models, main_metric=r2_score)
dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | r2_score:-0.6891 |Time:0.00 sec
```
#### How to compare 2+ models?
```python
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyRegressor": {
"model": DummyRegressor()
},
"LinearRegression": {
"model": linear_model.LinearRegression()
}
}
evaluator = Evaluator(models)
dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | mean_squared_error:111.1508 |Time:0.00 sec
Model: LinearRegression | cleaner:DummyCleaner | mean_squared_error:169.0083 |Time:0.00 sec
```
#### How to maximize instead of minimize ?
```python
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC()
}
}
evaluator = Evaluator(models, main_metric=accuracy_score, maximize_metric=True)
digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.4179 |Time:0.89 sec
Model: DummyClassifier | cleaner:DummyCleaner | accuracy_score:0.0896 |Time:0.00 sec
```
#### How to compare and tune models?
```python
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"SVC": {
"model": svm.SVC()
},
"Improved SVC": {
"model": svm.SVC(),
"params": {
"gamma": np.linspace(0, 0.1, num=10),
"C": range(1, 10)
}
}
}
evaluator = Evaluator(models, main_metric=accuracy_score, maximize_metric=True)
digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
print(model)
```
###### Output
```
Model: Improved SVC | cleaner:DummyCleaner | accuracy_score:0.6511 |Time:0.86 sec
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.4179 |Time:0.89 sec
SVC(C=2, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=0.011111111111111112,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
```
#### How to get multiple metric?
```
from sklearn.metrics import roc_auc_score, f1_score, accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC()
}
}
evaluator = Evaluator(models, main_metric=roc_auc_score,
additional_metrics=[f1_score, accuracy_score],
maximize_metric=True)
digits = datasets.load_breast_cancer()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: SVC | cleaner:DummyCleaner | roc_auc_score:0.5000 |f1_score:0.7650 |accuracy_score:0.6277 |Time:0.11 sec
Model: DummyClassifier | cleaner:DummyCleaner | roc_auc_score:0.4961 |f1_score:0.6063 |accuracy_score:0.5308 |Time:0.01 sec
```
#### Can I compare data engineering process? yes :)
```
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
from sklearn_helper.data.DataCleaner import DataCleaner
from sklearn_helper.data.DummyCleaner import DummyCleaner # No transformation is applied
class Thresholding(DataCleaner):
THRESHOLD = 3
def clean_training_data(self, x, y):
return self.clean_x(x), y
def clean_testing_data(self, x):
_x = np.copy(x)
_x[_x <= self.THRESHOLD] = 0
_x[_x > self.THRESHOLD] = 1
return _x
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC(C=2, gamma=0.0111)
}
}
evaluator = Evaluator(models, data_cleaners=[Thresholding(), DummyCleaner()], main_metric=accuracy_score)
digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyClassifier | cleaner:DummyCleaner | accuracy_score:0.0940 |Time:0.00 sec
Model: DummyClassifier | cleaner:Thresholding | accuracy_score:0.1035 |Time:0.00 sec
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.6516 |Time:0.87 sec
Model: SVC | cleaner:Thresholding | accuracy_score:0.8848 |Time:0.28 sec
```
#### For more examples refer to:
https://github.com/aras7/scikit-learn-helper/tree/master/examples
### ToDo:
* Add unit tests
* Add prediction time to printed results
* Improve documentation
* Splitter
* Add functionality to test different dataset as an alternative to `from sklearn.model_selection import cross_val_score`. It might be useful when resampling inside DataCleaners
* Add codecov.io suport
============
scikit-learn-helper is a light library with the purpose of providing utility functions that makes working with scikit-learn even easier, by letting us to focus on the solving the probling instead of writting boilerplate code
### Installation
#### Dependencies
scikit-learn-helper requires:
scikit-learn (>= 0.20.2) Of course :)
#### User installation
pip install scikit-learn-helper
#### Source code
https://github.com/aras7/scikit-learn-helper
### Examples
#### How to use it?
```python
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyRegressor": {
"model": DummyRegressor()
}
}
evaluator = Evaluator(models)
dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | mean_squared_error:111.1508 |Time:0.00 sec
```
#### How to use other another metric for evaluation?
```
from sklearn_helper.model.evaluator import Evaluator
from sklearn.metrics import r2_score
models = {
"DummyRegressor": {
"model": DummyRegressor()
}
}
evaluator = Evaluator(models, main_metric=r2_score)
dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | r2_score:-0.6891 |Time:0.00 sec
```
#### How to compare 2+ models?
```python
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyRegressor": {
"model": DummyRegressor()
},
"LinearRegression": {
"model": linear_model.LinearRegression()
}
}
evaluator = Evaluator(models)
dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | mean_squared_error:111.1508 |Time:0.00 sec
Model: LinearRegression | cleaner:DummyCleaner | mean_squared_error:169.0083 |Time:0.00 sec
```
#### How to maximize instead of minimize ?
```python
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC()
}
}
evaluator = Evaluator(models, main_metric=accuracy_score, maximize_metric=True)
digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.4179 |Time:0.89 sec
Model: DummyClassifier | cleaner:DummyCleaner | accuracy_score:0.0896 |Time:0.00 sec
```
#### How to compare and tune models?
```python
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"SVC": {
"model": svm.SVC()
},
"Improved SVC": {
"model": svm.SVC(),
"params": {
"gamma": np.linspace(0, 0.1, num=10),
"C": range(1, 10)
}
}
}
evaluator = Evaluator(models, main_metric=accuracy_score, maximize_metric=True)
digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
print(model)
```
###### Output
```
Model: Improved SVC | cleaner:DummyCleaner | accuracy_score:0.6511 |Time:0.86 sec
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.4179 |Time:0.89 sec
SVC(C=2, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=0.011111111111111112,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
```
#### How to get multiple metric?
```
from sklearn.metrics import roc_auc_score, f1_score, accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC()
}
}
evaluator = Evaluator(models, main_metric=roc_auc_score,
additional_metrics=[f1_score, accuracy_score],
maximize_metric=True)
digits = datasets.load_breast_cancer()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: SVC | cleaner:DummyCleaner | roc_auc_score:0.5000 |f1_score:0.7650 |accuracy_score:0.6277 |Time:0.11 sec
Model: DummyClassifier | cleaner:DummyCleaner | roc_auc_score:0.4961 |f1_score:0.6063 |accuracy_score:0.5308 |Time:0.01 sec
```
#### Can I compare data engineering process? yes :)
```
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
from sklearn_helper.data.DataCleaner import DataCleaner
from sklearn_helper.data.DummyCleaner import DummyCleaner # No transformation is applied
class Thresholding(DataCleaner):
THRESHOLD = 3
def clean_training_data(self, x, y):
return self.clean_x(x), y
def clean_testing_data(self, x):
_x = np.copy(x)
_x[_x <= self.THRESHOLD] = 0
_x[_x > self.THRESHOLD] = 1
return _x
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC(C=2, gamma=0.0111)
}
}
evaluator = Evaluator(models, data_cleaners=[Thresholding(), DummyCleaner()], main_metric=accuracy_score)
digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyClassifier | cleaner:DummyCleaner | accuracy_score:0.0940 |Time:0.00 sec
Model: DummyClassifier | cleaner:Thresholding | accuracy_score:0.1035 |Time:0.00 sec
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.6516 |Time:0.87 sec
Model: SVC | cleaner:Thresholding | accuracy_score:0.8848 |Time:0.28 sec
```
#### For more examples refer to:
https://github.com/aras7/scikit-learn-helper/tree/master/examples
### ToDo:
* Add unit tests
* Add prediction time to printed results
* Improve documentation
* Splitter
* Add functionality to test different dataset as an alternative to `from sklearn.model_selection import cross_val_score`. It might be useful when resampling inside DataCleaners
* Add codecov.io suport
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scikit_learn_helper-0.0.10.tar.gz
.
File metadata
- Download URL: scikit_learn_helper-0.0.10.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9dd9bc261c32b697223d1db128cae555e13b00bbedd687d4b37e30f35bc86a77 |
|
MD5 | a3bab50ecd52c4bf4946312ce8ff24ed |
|
BLAKE2b-256 | c65caed45386892dcd428198c686b1619d0bbb9b7f215ee579787dd70cc4dd04 |
File details
Details for the file scikit_learn_helper-0.0.10-py3-none-any.whl
.
File metadata
- Download URL: scikit_learn_helper-0.0.10-py3-none-any.whl
- Upload date:
- Size: 22.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b89a60003782370c44da81a5b4fd1636eda8144bdbdeb98522b05785754b8db |
|
MD5 | 258dc0017755009850f7d1decf60627f |
|
BLAKE2b-256 | 6dacb4da3f513f6a901b51f5d45a633e9814682f20d26abae5e3b327757ff5c5 |