hyanova

A pure python implementation of fuctional ANOVA algorithm.

These details have not been verified by PyPI

Project links

Homepage

Project description

# HyANOVA

HyANOVA is a pure python implementation of fuctional ANOVA algorithm, which can be used to analyze the importance of hyperparameters in machine learning algorithm.

Quick Start

To install the package, please use the pip installation as follows:

pip install hyanova

Here is a short example of usage. You can download the data from the example folder.

import hyanova

path = './iris[GridSearchCV]Model1.csv'              # gridsearch results generated by sklearn
metric = 'mean_test_score'                           # metric for model performance
df,params = hyanova.read_csv(path,metric)
# df,params = hyanova.read_df(df,metric)              You can also load data from pd.DataFrame
importance = hyanova.analyze(df)

The metric is the feature you choose to evaluate the model performance, it must appears in the .csv file or the pandas.DataFrame object’s column. And the result you got will be similar to this below, see the next section(ANOVA) for more details.

print(importance)
>>>              u       v_u  F_u(v_u/v_all)
0           (alpha,)  0.056885        0.892057
1        (l1_ratio,)  0.002489        0.039030
2  (alpha, l1_ratio)  0.004394        0.068912

APIs

Load Data

HyANOVA is designed to analyze the grid search results generated by sklearn. It provides two ways to load the data.

read_df(df,metric)

You can use read_df(df,metric) to load data from a <class 'pandas.core.frame.DataFrame'> object.

Parameters:

df:<class ‘pandas.core.frame.DataFrame’>, the DataFrame you want to analyze.

metric:string, the metric you choose.

Returns:

result_df:<class ‘pandas.core.frame.DataFrame’>,a DataFrame with all hyperparameters’ value and the value of metric you choose

params_list: list, a list of all hyperparameters’ name.

read_csv(path,metric)

Use hyanova.read_csv(path,metric) to load data from .csv file. It is equivalent to hyanova.read_df(pandas.read_csv(path),metric).

Parameters:

path:string, path of the DataFrame you want to analyze.

metric:string, the metric you choose.

Returns:

result_df:<class ‘pandas.core.frame.DataFrame’>,a DataFrame with all hyperparameters’ value and the value of metric you choose

params_list: list, a list of all hyperparameters’ name.

Example

The template file can be find at the example folder. Here is an example.

print(df.head)

>>> mean_fit_time  std_fit_time  mean_score_time  std_score_time  param_alpha  \
0       0.003899      0.000194         0.048513        0.007621     0.000977
1       0.003401      0.000584         0.042454        0.011295     0.000977
2       0.002706      0.000502         0.048544        0.009059     0.000977
3       0.003304      0.000531         0.040709        0.003031     0.000977
4       0.001801      0.000116         0.000289        0.000014     0.000977

   param_l1_ratio                                     params  \
0            0.00   {'alpha': 0.0009765625, 'l1_ratio': 0.0}
1            0.25  {'alpha': 0.0009765625, 'l1_ratio': 0.25}
2            0.50   {'alpha': 0.0009765625, 'l1_ratio': 0.5}
3            0.75  {'alpha': 0.0009765625, 'l1_ratio': 0.75}
4            1.00   {'alpha': 0.0009765625, 'l1_ratio': 1.0}

   split0_test_score  split1_test_score  split2_test_score  mean_test_score  \
0           0.828571           0.971429           0.971429         0.923810
1           0.885714           0.971429           0.942857         0.933333
2           0.885714           1.000000           0.942857         0.942857
3           0.885714           0.914286           0.914286         0.904762
4           0.885714           1.000000           0.942857         0.942857

   std_test_score  rank_test_score
0        0.067344                4
1        0.035635                3
2        0.046657                1
3        0.013469                5
4        0.046657                1

df,params = hyanova.read_df(df,'mean_test_score')
print(df.head)
>>>  alpha  l1_ratio  mean_test_score
0  0.000977      0.00         0.923810
1  0.000977      0.25         0.933333
2  0.000977      0.50         0.942857
3  0.000977      0.75         0.904762
4  0.000977      1.00         0.942857
print(params)
>>> ['alpha', 'l1_ratio']

ANOVA

analyze(df,max_iter=-1)

Use hyanova.analyze(df,max_iter=-1) to do the functional ANOVA decomposition.

Parameters:

df:<class ‘pandas.core.frame.DataFrame’>, the DataFrame you want to analyze.

max_iter:int, default to -1.

Returns:

result_df:<class ‘pandas.core.frame.DataFrame’>

The df parameter needs a pnadas.DataFrame object which has a format similar to the following table. You can use the methods HyANOVA provides to load data easily.

	alpha	l1_ratio	meantestscore
0	0.00977	0.00	0.923810
1	0.00977	0.25	0.933333
2	0.00977	0.50	0.942857
3	0.00977	0.75	0.904762

Note: The metric(meantestscore) should always be in the last column.

Example

The hyanova.analyze(df) will return a DataFrame with hyperparameters’ name, variance(vu) and the importance(Fu).

importance = hyanova.analyze(df)
>>> 100%|██████████████████████████████████| 3/3 [00:00<00:00, 11.32 it/s]
print(importance)
>>>              u       v_u  F_u(v_u/v_all)
0           (alpha,)  0.056885        0.892057
1        (l1_ratio,)  0.002489        0.039030
2  (alpha, l1_ratio)  0.004394        0.068912

Note: The Fu is the ratio of the variance caused by the hyperparameter itself(vu) to the variance of all trials(vall), so all Fu sums always equal to 1.See references for more details.

Due to the performance limitations of Python, the functional ANOVA will be very slow when the number of hyperparameters is high (more than 5). You can end the analysis early by setting the max_iter parameter. In fact, we usually only need the univariate importance, so set the max_iter parameter to equal the number of features for shorter runtime.

importance = hyanova.analyze(df,max_iter=2)
>>> 100%|██████████████████████████████████| 2/2 [00:00<00:00, 8.12 it/s]
print(importance)
>>>              u       v_u  F_u(v_u/v_all)
0           (alpha,)  0.056885        0.892057
1        (l1_ratio,)  0.002489        0.039030

Example usage

You can use sklearn to do hyperparameters search and then use hyanova to analyze the importance of hyperparameters.

import sklearn.datasets
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
import pandas as pd
import hyanova

iris = sklearn.datasets.load_iris()
X = iris.data
y = iris.target
model = SVC()
grid = {'C': np.linspace(1e-9, 128, 10000)
             'kernel': ('rbf', 'linear', 'poly', 'sigmoid')}
grid_search = GridSearchCV(model,grid)
result = grid_search.fit(X, y)
df = pd.DataFrame(result.cv_results_)
metric = 'mean_test_score'
df, params = hyanova.read_df(df,metric)
importance = hyanova.analyze(df)

Dependencies

numpy
pandas
tqdm

Why created HyANOVA?

I am completing my undergraduate thesis. In order to better understand the models used in my article, I looked for a lot of algorithms that can measure the importance of hyperparameters. Among them, functional ANOVA seems to be the most effective. But the original author’s implementation is based on java and uses python to call java files, which confuses me. I hope there is a module that is easier to understand and implemented completely based on python, which can help me with ANOVA decomposition, so I created HyANOVA. Hope that will help you too!

References

Hutter, F., Hoos, H. & Leyton-Brown, K.. (2014). An Efficient Approach for Assessing Hyperparameter Importance. Proceedings of the 31st International Conference on Machine Learning, in PMLR 32(1):754-762
https://github.com/frank-hutter/fanova

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.1.2

Apr 3, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyanova-1.1.2.zip (16.0 kB view details)

Uploaded Apr 3, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hyanova-1.1.2-py3-none-any.whl (6.6 kB view details)

Uploaded Apr 3, 2020 Python 3

File details

Details for the file hyanova-1.1.2.zip.

File metadata

Download URL: hyanova-1.1.2.zip
Upload date: Apr 3, 2020
Size: 16.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.5

File hashes

Hashes for hyanova-1.1.2.zip
Algorithm	Hash digest
SHA256	`077753795f8f85fc98c57ba8117c674d350055b3328d527ffef4ac495e249c9b`
MD5	`53f8523514830f54e8efec7b6e548d23`
BLAKE2b-256	`93fb98164235f48035d371666049c744172fcd949f28155645914e11a52a502d`

See more details on using hashes here.

File details

Details for the file hyanova-1.1.2-py3-none-any.whl.

File metadata

Download URL: hyanova-1.1.2-py3-none-any.whl
Upload date: Apr 3, 2020
Size: 6.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.5

File hashes

Hashes for hyanova-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c6243b7b953c1d6dbec99230999ef0d18d8f1cfeb9c358708a52f895cc206898`
MD5	`b1f87aa05282b886ebe2b8886b00935e`
BLAKE2b-256	`0f273b9d6f2e9cf7dcc113317e94684f319c30b55dd8558467e7a525ff802266`

See more details on using hashes here.

hyanova 1.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Quick Start

APIs

Load Data

read_df(df,metric)

read_csv(path,metric)

Example

ANOVA

analyze(df,max_iter=-1)

Example

Example usage

Dependencies

Why created HyANOVA?

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes