Skip to main content

A pure python implementation of fuctional ANOVA algorithm.

Project description

HyANOVA

HyANOVA is a pure python implementation of fuctional ANOVA algorithm. It can help you to analyze the importance of hyperparameters in machine learning algorithm.

Quick Start

To install the package, please use the pip installation as follows:

pip install hyanova

Here is a short example of usage. You can download the data from the example folder.

import hyanova

path = './iris[GridSearchCV]Model1.csv' 		# gridsearch results generated by sklearn
metric = 'mean_test_score' 						# metric for model performance
df,params = hyanova.read_csv(path,metric)
# df,params = hyanova.read_df(df,metric)		 You can also load data from pd.DataFrame
importance = hyanova.analyze(df,params,metric)

The metric is the feature you used to evaluate the model performance, it must appears in the .csv file or the pandas.DataFrame object's column. And the result you got will be similar to this below, see the next section(ANOVA) for more details.

print(importance)
>>>              u       v_u  F_u(v_u/v_all)
0           (alpha,)  0.056885        0.892057
1        (l1_ratio,)  0.002489        0.039030
2  (alpha, l1_ratio)  0.004394        0.068912

APIs

Load Data

HyANOVA is designed to analyze the grid search results generated by sklearn. It provides two ways to load the data.

  • You can use read_df(df,metric) to load data from a <class 'pandas.core.frame.DataFrame'> object. It will return two objects.

    • a DataFrame with all hyperparameters' value and the value of metric you choose
    • a list of all hyperparameters' name

    Here is an example.

    print(df.head)
    
    >>> mean_fit_time  std_fit_time  mean_score_time  std_score_time  param_alpha  \
    0       0.003899      0.000194         0.048513        0.007621     0.000977   
    1       0.003401      0.000584         0.042454        0.011295     0.000977   
    2       0.002706      0.000502         0.048544        0.009059     0.000977   
    3       0.003304      0.000531         0.040709        0.003031     0.000977   
    4       0.001801      0.000116         0.000289        0.000014     0.000977   
    
       param_l1_ratio                                     params  \
    0            0.00   {'alpha': 0.0009765625, 'l1_ratio': 0.0}   
    1            0.25  {'alpha': 0.0009765625, 'l1_ratio': 0.25}   
    2            0.50   {'alpha': 0.0009765625, 'l1_ratio': 0.5}   
    3            0.75  {'alpha': 0.0009765625, 'l1_ratio': 0.75}   
    4            1.00   {'alpha': 0.0009765625, 'l1_ratio': 1.0}   
    
       split0_test_score  split1_test_score  split2_test_score  mean_test_score  \
    0           0.828571           0.971429           0.971429         0.923810   
    1           0.885714           0.971429           0.942857         0.933333   
    2           0.885714           1.000000           0.942857         0.942857   
    3           0.885714           0.914286           0.914286         0.904762   
    4           0.885714           1.000000           0.942857         0.942857   
    
       std_test_score  rank_test_score  
    0        0.067344                4  
    1        0.035635                3  
    2        0.046657                1  
    3        0.013469                5  
    4        0.046657                1  
    
    df,params = hyanova.read_df(df,'mean_test_score')
    print(df.head)
    >>>  alpha  l1_ratio  mean_test_score
    0  0.000977      0.00         0.923810
    1  0.000977      0.25         0.933333
    2  0.000977      0.50         0.942857
    3  0.000977      0.75         0.904762
    4  0.000977      1.00         0.942857
    print(params)
    >>> ['alpha', 'l1_ratio']
    
  • Use hyanova.read_csv(path,metric) to load data from .csv file. The template file can be find at the example folder. It is same as hyanova.read_df(pandas.read_csv(path),metric).

ANOVA

Use hyanova.analyze(df) to do the functional ANOVA decomposition. It needs a pnadas.DataFrame object which have the format like this. You can use the methods HyANOVA provides to load data easily.

alpha l1_ratio mean_test_score
0 0.00977 0.00 0.923810
1 0.00977 0.25 0.933333
2 0.00977 0.50 0.942857
3 0.00977 0.75 0.904762

Note: The metric(mean_test_score) should always be in the last column.

The hyanova.analyze(df) will return a DataFrame with hyperparameters' name, variance(v_u) and the importance(F_u).

importance = hyanova.analyze(df,params,'mean_test_score')
>>> 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 3/3 [00:00<00:00, 11.32it/s]
print(importance)
>>>              u       v_u  F_u(v_u/v_all)
0           (alpha,)  0.056885        0.892057
1        (l1_ratio,)  0.002489        0.039030
2  (alpha, l1_ratio)  0.004394        0.068912

Note: The F_u is the ratio of the variance caused by the hyperparameter itself(v_u) to the variance of all trials(v_all), so all F_u sums always equal to 1.See references for more details.

Example usage

You can use sklearn to do hyperparameters search and then use hyanova to analyze the importance of hyperparameters.

import sklearn.datasets
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
import pandas as pd
import hyanova

iris = sklearn.datasets.load_iris()
X = iris.data
y = iris.target
model = SVC()
grid = {'C': np.linspace(1e-9, 128, 10000)
		'kernel': ('rbf', 'linear', 'poly', 'sigmoid')}
grid_search = GridSearchCV(model,grid)
result = grid_search.fit(X, y)
df = pd.DataFrame(result.cv_results_)
metric = 'mean_test_score'
df, params = hyanova.read_df(df,metric)
importance = hyanova.analyze(df)

Dependencies

  • numpy
  • pandas
  • tqdm

Why I created HyANOVA?

I am completing my undergraduate thesis. In order to better understand the model in my article, I looked for a lot of algorithms that can measure the importance of hyperparameters. Among them, functional ANOVA seems to be the most effective. But the original author's implementation is based on java and uses python to call java files, which confuses me. I hope there is a module that is easier to understand and implemented completely based on python, which can help me with ANOVA decomposition, so I created HyANOVA. I hope it can help you too.

References

  1. Hutter, F., Hoos, H. & Leyton-Brown, K.. (2014). An Efficient Approach for Assessing Hyperparameter Importance. Proceedings of the 31st International Conference on Machine Learning, in PMLR 32(1):754-762
  2. https://github.com/frank-hutter/fanova

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyanova-1.0.3.zip (11.3 kB view details)

Uploaded Source

Built Distribution

hyanova-1.0.3-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file hyanova-1.0.3.zip.

File metadata

  • Download URL: hyanova-1.0.3.zip
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.5

File hashes

Hashes for hyanova-1.0.3.zip
Algorithm Hash digest
SHA256 73afb24bdbcabd61d451f87c6f7b76e41d54d69cea0b6031d20ebc1214d77f2c
MD5 9c81afefdf8cdbdcf777d337182a9789
BLAKE2b-256 3c5581eb6c60acc73370b584716db7f30aabfa6bba26a1daaf21397366fa79f1

See more details on using hashes here.

File details

Details for the file hyanova-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: hyanova-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.5

File hashes

Hashes for hyanova-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 16bdf900525f8c00fa21f4b669301e85bf986f7f692fed1c62613a65f1be983a
MD5 f2d592125ff36e8e68f6dad128a51f8b
BLAKE2b-256 7d55962b36b965a753686ce793c67d84bbc853151c50dd7d31c608fc287ba94e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page