A pure python implementation of fuctional ANOVA algorithm.
Project description
HyANOVA
HyANOVA is a pure python implementation of fuctional ANOVA algorithm, which can be used to analyze the importance of hyperparameters in machine learning algorithm.
Quick Start
To install the package, please use the ``pip`` installation as follows:
.. code:: shell
pip install hyanova
Here is a short example of usage. You can download the
`data <./examples/iris%5BGridSearchCV%5DModel1.csv>`__ from the example
folder.
.. code:: python
import hyanova
path = './iris[GridSearchCV]Model1.csv' # gridsearch results generated by sklearn
metric = 'mean_test_score' # metric for model performance
df,params = hyanova.read_csv(path,metric)
# df,params = hyanova.read_df(df,metric) You can also load data from pd.DataFrame
importance = hyanova.analyze(df)
The ``metric`` is the feature you choose to evaluate the model
performance, it must appears in the ``.csv`` file or the
``pandas.DataFrame`` object’s column. And the result you got will be
similar to this below, see the next section(ANOVA) for more details.
.. code:: python
print(importance)
>>> u v_u F_u(v_u/v_all)
0 (alpha,) 0.056885 0.892057
1 (l1_ratio,) 0.002489 0.039030
2 (alpha, l1_ratio) 0.004394 0.068912
APIs
~~~~
Load Data
'''''''''
HyANOVA is designed to analyze the grid search results generated by
sklearn. It provides two ways to load the data.
- You can use ``read_df(df,metric)`` to load data from a
``<class 'pandas.core.frame.DataFrame'>`` object. It will return two
objects.
- a ``DataFrame`` with all hyperparameters’ value and the value of
metric you choose
- a ``list`` of all hyperparameters’ name
Here is an example.
.. code:: python
print(df.head)
.. code:: shell
>>> mean_fit_time std_fit_time mean_score_time std_score_time param_alpha \
0 0.003899 0.000194 0.048513 0.007621 0.000977
1 0.003401 0.000584 0.042454 0.011295 0.000977
2 0.002706 0.000502 0.048544 0.009059 0.000977
3 0.003304 0.000531 0.040709 0.003031 0.000977
4 0.001801 0.000116 0.000289 0.000014 0.000977
param_l1_ratio params \
0 0.00 {'alpha': 0.0009765625, 'l1_ratio': 0.0}
1 0.25 {'alpha': 0.0009765625, 'l1_ratio': 0.25}
2 0.50 {'alpha': 0.0009765625, 'l1_ratio': 0.5}
3 0.75 {'alpha': 0.0009765625, 'l1_ratio': 0.75}
4 1.00 {'alpha': 0.0009765625, 'l1_ratio': 1.0}
split0_test_score split1_test_score split2_test_score mean_test_score \
0 0.828571 0.971429 0.971429 0.923810
1 0.885714 0.971429 0.942857 0.933333
2 0.885714 1.000000 0.942857 0.942857
3 0.885714 0.914286 0.914286 0.904762
4 0.885714 1.000000 0.942857 0.942857
std_test_score rank_test_score
0 0.067344 4
1 0.035635 3
2 0.046657 1
3 0.013469 5
4 0.046657 1
.. code:: python
df,params = hyanova.read_df(df,'mean_test_score')
print(df.head)
>>> alpha l1_ratio mean_test_score
0 0.000977 0.00 0.923810
1 0.000977 0.25 0.933333
2 0.000977 0.50 0.942857
3 0.000977 0.75 0.904762
4 0.000977 1.00 0.942857
print(params)
>>> ['alpha', 'l1_ratio']
- Use ``hyanova.read_csv(path,metric)`` to load data from ``.csv``
file. The `template
file <./examples/iris%5BGridSearchCV%5DModel1.csv>`__ can be find at
the example folder. It is equivalent to
``hyanova.read_df(pandas.read_csv(path),metric)``.
ANOVA
'''''
Use ``hyanova.analyze(df)`` to do the functional ANOVA decomposition. It
needs a ``pnadas.DataFrame`` object which has a format similar to the
following table. You can use the methods HyANOVA provides to load data
easily.
== ======= ======== ===============
\ alpha l1_ratio mean_test_score
== ======= ======== ===============
0 0.00977 0.00 0.923810
1 0.00977 0.25 0.933333
2 0.00977 0.50 0.942857
3 0.00977 0.75 0.904762
== ======= ======== ===============
**Note:** The metric(mean_test_score) should always be in the last
column.
The ``hyanova.analyze(df)`` will return a ``DataFrame`` with
hyperparameters’ name, variance(v_u) and the importance(F_u).
.. code:: python
importance = hyanova.analyze(df)
>>> 100%|██████████████████████████████████| 3/3 [00:00<00:00, 11.32it/s]
print(importance)
>>> u v_u F_u(v_u/v_all)
0 (alpha,) 0.056885 0.892057
1 (l1_ratio,) 0.002489 0.039030
2 (alpha, l1_ratio) 0.004394 0.068912
**Note:** The F_u is the ratio of the variance caused by the
hyperparameter itself(v_u) to the variance of all trials(v_all), so all
F_u sums always equal to 1.See references for more details.
Example usage
You can use sklearn to do hyperparameters search and then use hyanova to analyze the importance of hyperparameters.
.. code:: python
import sklearn.datasets from sklearn.model_selection import GridSearchCV from sklearn.svm import SVC import pandas as pd import hyanova
iris = sklearn.datasets.load_iris() X = iris.data y = iris.target model = SVC() grid = {'C': np.linspace(1e-9, 128, 10000) 'kernel': ('rbf', 'linear', 'poly', 'sigmoid')} grid_search = GridSearchCV(model,grid) result = grid_search.fit(X, y) df = pd.DataFrame(result.cv_results_) metric = 'mean_test_score' df, params = hyanova.read_df(df,metric) importance = hyanova.analyze(df)
Dependencies
- numpy
- pandas
- tqdm
Why created HyANOVA?
I am completing my undergraduate thesis. In order to better understand the models used in my article, I looked for a lot of algorithms that can measure the importance of hyperparameters. Among them, functional ANOVA seems to be the most effective. But the original author’s implementation is based on java and uses python to call java files, which confuses me. I hope there is a module that is easier to understand and implemented completely based on python, which can help me with ANOVA decomposition, so I created HyANOVA. Hope that will help you too!
References
1. Hutter, F., Hoos, H. & Leyton-Brown, K.. (2014). An Efficient
Approach for Assessing Hyperparameter Importance. Proceedings of the
31st International Conference on Machine Learning, in PMLR
32(1):754-762
2. https://github.com/frank-hutter/fanova
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.