Skip to main content

Helper code to make easier working with sklearn. https://github.com/aras7/scikit-learn-helper

Project description

scikit-learn-helper
============

scikit-learn-helper is a light library with the purpose of providing utility functions that makes working with scikit-learn even easier, by letting us to focus on the solving the probling instead of writting boilerplate code

### Installation

#### Dependencies
scikit-learn-helper requires:

scikit-learn (>= 0.20.2) Of course :)

#### User installation


pip install scikit-learn-helper


#### Source code


https://github.com/aras7/scikit-learn-helper

### Examples

#### How to use it?

```python
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyRegressor": {
"model": DummyRegressor()
}
}
evaluator = Evaluator(models)

dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```

###### Output

```
Model: DummyRegressor | cleaner:DummyCleaner | mean_squared_error:111.1508 |Time:0.00 sec
```

#### How to use other another metric for evaluation?
```
from sklearn_helper.model.evaluator import Evaluator
from sklearn.metrics import r2_score
models = {
"DummyRegressor": {
"model": DummyRegressor()
}
}
evaluator = Evaluator(models, main_metric=r2_score)

dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)
```

###### Output
```
Model: DummyRegressor | cleaner:DummyCleaner | r2_score:-0.6891 |Time:0.00 sec
```

#### How to compare 2+ models?

```python
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyRegressor": {
"model": DummyRegressor()
},
"LinearRegression": {
"model": linear_model.LinearRegression()
}
}
evaluator = Evaluator(models)

dataset = datasets.load_boston()
X, y = dataset.data, dataset.target
model = evaluator.evaluate(X, y)

```

###### Output

```
Model: DummyRegressor | cleaner:DummyCleaner | mean_squared_error:111.1508 |Time:0.00 sec
Model: LinearRegression | cleaner:DummyCleaner | mean_squared_error:169.0083 |Time:0.00 sec
```

#### How to maximize instead of minimize ?

```python
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator

models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC()
}
}
evaluator = Evaluator(models, main_metric=accuracy_score, maximize_metric=True)

digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```

###### Output

```
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.4179 |Time:0.89 sec
Model: DummyClassifier | cleaner:DummyCleaner | accuracy_score:0.0896 |Time:0.00 sec
```

#### How to compare and tune models?
```python
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator

models = {
"SVC": {
"model": svm.SVC()
},
"Improved SVC": {
"model": svm.SVC(),
"params": {
"gamma": np.linspace(0, 0.1, num=10),
"C": range(1, 10)
}
}
}
evaluator = Evaluator(models, main_metric=accuracy_score, maximize_metric=True)

digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
print(model)
```

###### Output
```
Model: Improved SVC | cleaner:DummyCleaner | accuracy_score:0.6511 |Time:0.86 sec
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.4179 |Time:0.89 sec
SVC(C=2, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=0.011111111111111112,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
```


#### How to get multiple metric?
```
from sklearn.metrics import roc_auc_score, f1_score, accuracy_score
from sklearn_helper.model.evaluator import Evaluator
models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC()
}
}
evaluator = Evaluator(models, main_metric=roc_auc_score,
additional_metrics=[f1_score, accuracy_score],
maximize_metric=True)
digits = datasets.load_breast_cancer()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```

###### Output
```
Model: SVC | cleaner:DummyCleaner | roc_auc_score:0.5000 |f1_score:0.7650 |accuracy_score:0.6277 |Time:0.11 sec
Model: DummyClassifier | cleaner:DummyCleaner | roc_auc_score:0.4961 |f1_score:0.6063 |accuracy_score:0.5308 |Time:0.01 sec
```

#### Can I compare data engineering process? yes :)
```
from sklearn.metrics import accuracy_score
from sklearn_helper.model.evaluator import Evaluator
from sklearn_helper.data.DataCleaner import DataCleaner
from sklearn_helper.data.DummyCleaner import DummyCleaner # No transformation is applied

class Thresholding(DataCleaner):
THRESHOLD = 3
def clean_training_data(self, x, y):
return self.clean_x(x), y
def clean_testing_data(self, x):
_x = np.copy(x)
_x[_x <= self.THRESHOLD] = 0
_x[_x > self.THRESHOLD] = 1
return _x

models = {
"DummyClassifier": {
"model": DummyClassifier()
},
"SVC": {
"model": svm.SVC(C=2, gamma=0.0111)
}
}
evaluator = Evaluator(models, data_cleaners=[Thresholding(), DummyCleaner()], main_metric=accuracy_score)

digits = datasets.load_digits()
X, y = digits.data, digits.target
model = evaluator.evaluate(X, y)
```

###### Output
```
Model: DummyClassifier | cleaner:DummyCleaner | accuracy_score:0.0940 |Time:0.00 sec
Model: DummyClassifier | cleaner:Thresholding | accuracy_score:0.1035 |Time:0.00 sec
Model: SVC | cleaner:DummyCleaner | accuracy_score:0.6516 |Time:0.87 sec
Model: SVC | cleaner:Thresholding | accuracy_score:0.8848 |Time:0.28 sec
```

#### For more examples refer to:

https://github.com/aras7/scikit-learn-helper/tree/master/examples


### ToDo:

* Add unit tests
* Add prediction time to printed results
* Improve documentation
* Splitter
* Add functionality to test different dataset as an alternative to `from sklearn.model_selection import cross_val_score`. It might be useful when resampling inside DataCleaners
* Add codecov.io suport


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_learn_helper-0.0.10.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

scikit_learn_helper-0.0.10-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file scikit_learn_helper-0.0.10.tar.gz.

File metadata

  • Download URL: scikit_learn_helper-0.0.10.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for scikit_learn_helper-0.0.10.tar.gz
Algorithm Hash digest
SHA256 9dd9bc261c32b697223d1db128cae555e13b00bbedd687d4b37e30f35bc86a77
MD5 a3bab50ecd52c4bf4946312ce8ff24ed
BLAKE2b-256 c65caed45386892dcd428198c686b1619d0bbb9b7f215ee579787dd70cc4dd04

See more details on using hashes here.

File details

Details for the file scikit_learn_helper-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: scikit_learn_helper-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for scikit_learn_helper-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 2b89a60003782370c44da81a5b4fd1636eda8144bdbdeb98522b05785754b8db
MD5 258dc0017755009850f7d1decf60627f
BLAKE2b-256 6dacb4da3f513f6a901b51f5d45a633e9814682f20d26abae5e3b327757ff5c5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page