Skip to main content

Helper code to make easier working with sklearn. https://github.com/aras7/scikit-learn-helper

Project description

scikit-learn-helper
============

scikit-learn-helper is a light library with the purpose of providing utility functions that makes working with scikit-learn even easier, by letting us to focus on the solving the probling instead of writting boilerplate code

### Installation

#### Dependencies
scikit-learn-helper requires:

scikit-learn (>= 0.20.2) Of course :)

#### User installation


pip install scikit-learn-helper


#### Source code


https://github.com/aras7/scikit-learn-helper

### Examples

#### How to use it?

```python
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyRegressor": {
"model": DummyRegressor()
}
}
evaluator = Evaluator(models)

dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```

###### Output

```
Model: DummyRegressor | cleaner:DummyCleaner | mean_squared_error:111.1508 |Time:0.00 sec
```

#### How to use other another metric for evaluation?
```
from sklearn_helper.model.evaluator import Evaluator
from sklearn.metrics import r2_score
models = {
"DummyRegressor": {
"model": DummyRegressor()
}
}
evaluator = Evaluator(models, main_metric=r2_score)

dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```

###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | r2_score:-0.6891 |Time:0.00 sec
```

#### How to compare 2+ models?

```python
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyRegressor": {
"model": DummyRegressor()
},
"LinearRegression": {
"model": linear_model.LinearRegression()
}
}
evaluator = Evaluator(models)

dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)

```

###### Output

```
Model: DummyRegressor | cleaner:DummyCleaner | mean_squared_error:111.1508 |Time:0.00 sec
Model: LinearRegression | cleaner:DummyCleaner | mean_squared_error:169.0083 |Time:0.00 sec
```

#### How to maximize instead of minimize ?

```python
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator

models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC()
}
}
evaluator = Evaluator(models, main_metric=accuracy_score, maximize_metric=True)

digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```

###### Output

```
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.4179 |Time:0.89 sec
Model: DummyClassifier | cleaner:DummyCleaner | accuracy_score:0.0896 |Time:0.00 sec
```

#### How to compare and tune models?
```python
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator

models = {
"SVC": {
"model": svm.SVC()
},
"Improved SVC": {
"model": svm.SVC(),
"params": {
"gamma": np.linspace(0, 0.1, num=10),
"C": range(1, 10)
}
}
}
evaluator = Evaluator(models, main_metric=accuracy_score, maximize_metric=True)

digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
print(model)
```

###### Output
```
Model: Improved SVC | cleaner:DummyCleaner | accuracy_score:0.6511 |Time:0.86 sec
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.4179 |Time:0.89 sec
SVC(C=2, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=0.011111111111111112,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
```


#### How to get multiple metric?
```
from sklearn.metrics import roc_auc_score, f1_score, accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC()
}
}
evaluator = Evaluator(models, main_metric=roc_auc_score,
additional_metrics=[f1_score, accuracy_score],
maximize_metric=True)
digits = datasets.load_breast_cancer()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```

###### Output
```
Model: SVC | cleaner:DummyCleaner | roc_auc_score:0.5000 |f1_score:0.7650 |accuracy_score:0.6277 |Time:0.11 sec
Model: DummyClassifier | cleaner:DummyCleaner | roc_auc_score:0.4961 |f1_score:0.6063 |accuracy_score:0.5308 |Time:0.01 sec
```

#### Can I compare data engineering process? yes :)
```
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
from sklearn_helper.data.DataCleaner import DataCleaner
from sklearn_helper.data.DummyCleaner import DummyCleaner # No transformation is applied

class Thresholding(DataCleaner):
THRESHOLD = 3
def clean_training_data(self, x, y):
return self.clean_x(x), y
def clean_testing_data(self, x):
_x = np.copy(x)
_x[_x <= self.THRESHOLD] = 0
_x[_x > self.THRESHOLD] = 1
return _x

models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC(C=2, gamma=0.0111)
}
}
evaluator = Evaluator(models, data_cleaners=[Thresholding(), DummyCleaner()], main_metric=accuracy_score)

digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```

###### Output
```
Model: DummyClassifier | cleaner:DummyCleaner | accuracy_score:0.0940 |Time:0.00 sec
Model: DummyClassifier | cleaner:Thresholding | accuracy_score:0.1035 |Time:0.00 sec
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.6516 |Time:0.87 sec
Model: SVC | cleaner:Thresholding | accuracy_score:0.8848 |Time:0.28 sec
```

#### For more examples refer to:

https://github.com/aras7/scikit-learn-helper/tree/master/examples


### ToDo:

* Add unit tests
* Add prediction time to printed results
* Improve documentation
* Splitter
* Add functionality to test different dataset as an alternative to `from sklearn.model_selection import cross_val_score`. It might be useful when resampling inside DataCleaners
* Add codecov.io suport


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for scikit-learn-helper, version 0.0.10
Filename, size File type Python version Upload date Hashes
Filename, size scikit_learn_helper-0.0.10-py3-none-any.whl (22.8 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size scikit_learn_helper-0.0.10.tar.gz (9.6 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page