Helper code to make easier working with sklearn. https://github.com/aras7/scikit-learn-helper
Project description
scikit-learn-helper
============
scikit-learn-helper is a light library with the purpose of providing utility functions that makes working with scikit-learn even easier, by letting us to focus on the solving the probling instead of writting boilerplate code
### Installation
#### Dependencies
scikit-learn-helper requires:
scikit-learn (>= 0.20.2) Of course :)
#### User installation
pip install scikit-learn-helper
#### Source code
https://github.com/aras7/scikit-learn-helper
### Examples
#### How to use it?
```python
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyRegressor": {
"model": DummyRegressor()
}
}
evaluator = Evaluator(models)
dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | mean_squared_error:111.1508 |Time:0.00 sec
```
#### How to use other another metric for evaluation?
```
from sklearn_helper.model.evaluator import Evaluator
from sklearn.metrics import r2_score
models = {
"DummyRegressor": {
"model": DummyRegressor()
}
}
evaluator = Evaluator(models, main_metric=r2_score)
dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | r2_score:-0.6891 |Time:0.00 sec
```
#### How to compare 2+ models?
```python
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyRegressor": {
"model": DummyRegressor()
},
"LinearRegression": {
"model": linear_model.LinearRegression()
}
}
evaluator = Evaluator(models)
dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | mean_squared_error:111.1508 |Time:0.00 sec
Model: LinearRegression | cleaner:DummyCleaner | mean_squared_error:169.0083 |Time:0.00 sec
```
#### How to maximize instead of minimize ?
```python
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC()
}
}
evaluator = Evaluator(models, main_metric=accuracy_score, maximize_metric=True)
digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.4179 |Time:0.89 sec
Model: DummyClassifier | cleaner:DummyCleaner | accuracy_score:0.0896 |Time:0.00 sec
```
#### How to compare and tune models?
```python
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"SVC": {
"model": svm.SVC()
},
"Improved SVC": {
"model": svm.SVC(),
"params": {
"gamma": np.linspace(0, 0.1, num=10),
"C": range(1, 10)
}
}
}
evaluator = Evaluator(models, main_metric=accuracy_score, maximize_metric=True)
digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
print(model)
```
###### Output
```
Model: Improved SVC | cleaner:DummyCleaner | accuracy_score:0.6511 |Time:0.86 sec
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.4179 |Time:0.89 sec
SVC(C=2, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=0.011111111111111112,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
```
#### How to get multiple metric?
```
from sklearn.metrics import roc_auc_score, f1_score, accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC()
}
}
evaluator = Evaluator(models, main_metric=roc_auc_score,
additional_metrics=[f1_score, accuracy_score],
maximize_metric=True)
digits = datasets.load_breast_cancer()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: SVC | cleaner:DummyCleaner | roc_auc_score:0.5000 |f1_score:0.7650 |accuracy_score:0.6277 |Time:0.11 sec
Model: DummyClassifier | cleaner:DummyCleaner | roc_auc_score:0.4961 |f1_score:0.6063 |accuracy_score:0.5308 |Time:0.01 sec
```
#### Can I compare data engineering process? yes :)
```
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
from sklearn_helper.data.DataCleaner import DataCleaner
from sklearn_helper.data.DummyCleaner import DummyCleaner # No transformation is applied
class Thresholding(DataCleaner):
THRESHOLD = 3
def clean_training_data(self, x, y):
return self.clean_x(x), y
def clean_testing_data(self, x):
_x = np.copy(x)
_x[_x <= self.THRESHOLD] = 0
_x[_x > self.THRESHOLD] = 1
return _x
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC(C=2, gamma=0.0111)
}
}
evaluator = Evaluator(models, data_cleaners=[Thresholding(), DummyCleaner()], main_metric=accuracy_score)
digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyClassifier | cleaner:DummyCleaner | accuracy_score:0.0940 |Time:0.00 sec
Model: DummyClassifier | cleaner:Thresholding | accuracy_score:0.1035 |Time:0.00 sec
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.6516 |Time:0.87 sec
Model: SVC | cleaner:Thresholding | accuracy_score:0.8848 |Time:0.28 sec
```
#### For more examples refer to:
https://github.com/aras7/scikit-learn-helper/tree/master/examples
### ToDo:
* Add unit tests
* Add prediction time to printed results
* Improve documentation
* Splitter
* Add functionality to test different dataset as an alternative to `from sklearn.model_selection import cross_val_score`. It might be useful when resampling inside DataCleaners
* Add codecov.io suport
============
scikit-learn-helper is a light library with the purpose of providing utility functions that makes working with scikit-learn even easier, by letting us to focus on the solving the probling instead of writting boilerplate code
### Installation
#### Dependencies
scikit-learn-helper requires:
scikit-learn (>= 0.20.2) Of course :)
#### User installation
pip install scikit-learn-helper
#### Source code
https://github.com/aras7/scikit-learn-helper
### Examples
#### How to use it?
```python
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyRegressor": {
"model": DummyRegressor()
}
}
evaluator = Evaluator(models)
dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | mean_squared_error:111.1508 |Time:0.00 sec
```
#### How to use other another metric for evaluation?
```
from sklearn_helper.model.evaluator import Evaluator
from sklearn.metrics import r2_score
models = {
"DummyRegressor": {
"model": DummyRegressor()
}
}
evaluator = Evaluator(models, main_metric=r2_score)
dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | r2_score:-0.6891 |Time:0.00 sec
```
#### How to compare 2+ models?
```python
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyRegressor": {
"model": DummyRegressor()
},
"LinearRegression": {
"model": linear_model.LinearRegression()
}
}
evaluator = Evaluator(models)
dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | mean_squared_error:111.1508 |Time:0.00 sec
Model: LinearRegression | cleaner:DummyCleaner | mean_squared_error:169.0083 |Time:0.00 sec
```
#### How to maximize instead of minimize ?
```python
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC()
}
}
evaluator = Evaluator(models, main_metric=accuracy_score, maximize_metric=True)
digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.4179 |Time:0.89 sec
Model: DummyClassifier | cleaner:DummyCleaner | accuracy_score:0.0896 |Time:0.00 sec
```
#### How to compare and tune models?
```python
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"SVC": {
"model": svm.SVC()
},
"Improved SVC": {
"model": svm.SVC(),
"params": {
"gamma": np.linspace(0, 0.1, num=10),
"C": range(1, 10)
}
}
}
evaluator = Evaluator(models, main_metric=accuracy_score, maximize_metric=True)
digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
print(model)
```
###### Output
```
Model: Improved SVC | cleaner:DummyCleaner | accuracy_score:0.6511 |Time:0.86 sec
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.4179 |Time:0.89 sec
SVC(C=2, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=0.011111111111111112,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
```
#### How to get multiple metric?
```
from sklearn.metrics import roc_auc_score, f1_score, accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC()
}
}
evaluator = Evaluator(models, main_metric=roc_auc_score,
additional_metrics=[f1_score, accuracy_score],
maximize_metric=True)
digits = datasets.load_breast_cancer()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: SVC | cleaner:DummyCleaner | roc_auc_score:0.5000 |f1_score:0.7650 |accuracy_score:0.6277 |Time:0.11 sec
Model: DummyClassifier | cleaner:DummyCleaner | roc_auc_score:0.4961 |f1_score:0.6063 |accuracy_score:0.5308 |Time:0.01 sec
```
#### Can I compare data engineering process? yes :)
```
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
from sklearn_helper.data.DataCleaner import DataCleaner
from sklearn_helper.data.DummyCleaner import DummyCleaner # No transformation is applied
class Thresholding(DataCleaner):
THRESHOLD = 3
def clean_training_data(self, x, y):
return self.clean_x(x), y
def clean_testing_data(self, x):
_x = np.copy(x)
_x[_x <= self.THRESHOLD] = 0
_x[_x > self.THRESHOLD] = 1
return _x
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC(C=2, gamma=0.0111)
}
}
evaluator = Evaluator(models, data_cleaners=[Thresholding(), DummyCleaner()], main_metric=accuracy_score)
digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```
###### Output
```
Model: DummyClassifier | cleaner:DummyCleaner | accuracy_score:0.0940 |Time:0.00 sec
Model: DummyClassifier | cleaner:Thresholding | accuracy_score:0.1035 |Time:0.00 sec
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.6516 |Time:0.87 sec
Model: SVC | cleaner:Thresholding | accuracy_score:0.8848 |Time:0.28 sec
```
#### For more examples refer to:
https://github.com/aras7/scikit-learn-helper/tree/master/examples
### ToDo:
* Add unit tests
* Add prediction time to printed results
* Improve documentation
* Splitter
* Add functionality to test different dataset as an alternative to `from sklearn.model_selection import cross_val_score`. It might be useful when resampling inside DataCleaners
* Add codecov.io suport
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for scikit_learn_helper-0.0.10.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9dd9bc261c32b697223d1db128cae555e13b00bbedd687d4b37e30f35bc86a77 |
|
MD5 | a3bab50ecd52c4bf4946312ce8ff24ed |
|
BLAKE2b-256 | c65caed45386892dcd428198c686b1619d0bbb9b7f215ee579787dd70cc4dd04 |
Close
Hashes for scikit_learn_helper-0.0.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b89a60003782370c44da81a5b4fd1636eda8144bdbdeb98522b05785754b8db |
|
MD5 | 258dc0017755009850f7d1decf60627f |
|
BLAKE2b-256 | 6dacb4da3f513f6a901b51f5d45a633e9814682f20d26abae5e3b327757ff5c5 |