Skip to main content

Helper code to make easier working with sklearn. https://github.com/aras7/scikit-learn-helper

Project description

scikit-learn-helper
============

scikit-learn-helper is a light library with the purpose of providing utility functions that makes working with scikit-learn even easier, by letting us to focus on the solving the probling instead of writting boilerplate code

### Installation

#### Dependencies
scikit-learn-helper requires:

scikit-learn (>= 0.20.2) Of course :)

#### User installation


pip install scikit-learn-helper


#### Source code


https://github.com/aras7/scikit-learn-helper

### Examples

#### How to use it?

```python
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyRegressor": {
"model": DummyRegressor()
}
}
evaluator = Evaluator(models)

dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```

###### Output

```
Model: DummyRegressor | cleaner:DummyCleaner | mean_squared_error:111.1508 |Time:0.00 sec
```

#### How to use other another metric for evaluation?
```
from sklearn_helper.model.evaluator import Evaluator
from sklearn.metrics import r2_score
models = {
"DummyRegressor": {
"model": DummyRegressor()
}
}
evaluator = Evaluator(models, main_metric=r2_score)

dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```

###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | r2_score:-0.6891 |Time:0.00 sec
```

#### How to compare 2+ models?

```python
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyRegressor": {
"model": DummyRegressor()
},
"LinearRegression": {
"model": linear_model.LinearRegression()
}
}
evaluator = Evaluator(models)

dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)

```

###### Output

```
Model: DummyRegressor | cleaner:DummyCleaner | mean_squared_error:111.1508 |Time:0.00 sec
Model: LinearRegression | cleaner:DummyCleaner | mean_squared_error:169.0083 |Time:0.00 sec
```

#### How to maximize instead of minimize ?

```python
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator

models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC()
}
}
evaluator = Evaluator(models, main_metric=accuracy_score, maximize_metric=True)

digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```

###### Output

```
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.4179 |Time:0.89 sec
Model: DummyClassifier | cleaner:DummyCleaner | accuracy_score:0.0896 |Time:0.00 sec
```

#### How to compare and tune models?
```python
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator

models = {
"SVC": {
"model": svm.SVC()
},
"Improved SVC": {
"model": svm.SVC(),
"params": {
"gamma": np.linspace(0, 0.1, num=10),
"C": range(1, 10)
}
}
}
evaluator = Evaluator(models, main_metric=accuracy_score, maximize_metric=True)

digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
print(model)
```

###### Output
```
Model: Improved SVC | cleaner:DummyCleaner | accuracy_score:0.6511 |Time:0.86 sec
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.4179 |Time:0.89 sec
SVC(C=2, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=0.011111111111111112,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
```


#### How to get multiple metric?
```
from sklearn.metrics import roc_auc_score, f1_score, accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC()
}
}
evaluator = Evaluator(models, main_metric=roc_auc_score,
additional_metrics=[f1_score, accuracy_score],
maximize_metric=True)
digits = datasets.load_breast_cancer()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```

###### Output
```
Model: SVC | cleaner:DummyCleaner | roc_auc_score:0.5000 |f1_score:0.7650 |accuracy_score:0.6277 |Time:0.11 sec
Model: DummyClassifier | cleaner:DummyCleaner | roc_auc_score:0.4961 |f1_score:0.6063 |accuracy_score:0.5308 |Time:0.01 sec
```

#### Can I compare data engineering process? yes :)
```
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
from sklearn_helper.data.DataCleaner import DataCleaner
from sklearn_helper.data.DummyCleaner import DummyCleaner # No transformation is applied

class Thresholding(DataCleaner):
THRESHOLD = 3
def clean_training_data(self, x, y):
return self.clean_x(x), y
def clean_testing_data(self, x):
_x = np.copy(x)
_x[_x <= self.THRESHOLD] = 0
_x[_x > self.THRESHOLD] = 1
return _x

models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC(C=2, gamma=0.0111)
}
}
evaluator = Evaluator(models, data_cleaners=[Thresholding(), DummyCleaner()], main_metric=accuracy_score)

digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```

###### Output
```
Model: DummyClassifier | cleaner:DummyCleaner | accuracy_score:0.0940 |Time:0.00 sec
Model: DummyClassifier | cleaner:Thresholding | accuracy_score:0.1035 |Time:0.00 sec
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.6516 |Time:0.87 sec
Model: SVC | cleaner:Thresholding | accuracy_score:0.8848 |Time:0.28 sec
```

#### For more examples refer to:

https://github.com/aras7/scikit-learn-helper/tree/master/examples


### ToDo:

* Add unit tests
* Add prediction time to printed results
* Improve documentation
* Splitter
* Add functionality to test different dataset as an alternative to `from sklearn.model_selection import cross_val_score`. It might be useful when resampling inside DataCleaners
* Add codecov.io suport


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_learn_helper-0.0.10.tar.gz (9.6 kB view hashes)

Uploaded Source

Built Distribution

scikit_learn_helper-0.0.10-py3-none-any.whl (22.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page