A pure python implementation of fuctional ANOVA algorithm.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

HyANOVA

HyANOVA is a pure python implementation of fuctional ANOVA algorithm, which can be used to analyze the importance of hyperparameters in machine learning algorithm.

Quick Start


To install the package, please use the ``pip`` installation as follows:

.. code:: shell

   pip install hyanova

Here is a short example of usage. You can download the
`data <./examples/iris%5BGridSearchCV%5DModel1.csv>`__ from the example
folder.

.. code:: python

   import hyanova

   path = './iris[GridSearchCV]Model1.csv'         # gridsearch results generated by sklearn
   metric = 'mean_test_score'              # metric for model performance
   df,params = hyanova.read_csv(path,metric)
   # df,params = hyanova.read_df(df,metric)         You can also load data from pd.DataFrame
   importance = hyanova.analyze(df)

The ``metric`` is the feature you choose to evaluate the model
performance, it must appears in the ``.csv`` file or the
``pandas.DataFrame`` object’s column. And the result you got will be
similar to this below, see the next section(ANOVA) for more details.

.. code:: python

   print(importance)
   >>>              u       v_u  F_u(v_u/v_all)
   0           (alpha,)  0.056885        0.892057
   1        (l1_ratio,)  0.002489        0.039030
   2  (alpha, l1_ratio)  0.004394        0.068912

APIs
~~~~

Load Data
'''''''''

HyANOVA is designed to analyze the grid search results generated by
sklearn. It provides two ways to load the data.

-  You can use ``read_df(df,metric)`` to load data from a
   ``<class 'pandas.core.frame.DataFrame'>`` object. It will return two
   objects.

   -  a ``DataFrame`` with all hyperparameters’ value and the value of
      metric you choose
   -  a ``list`` of all hyperparameters’ name

   Here is an example.

   .. code:: python

      print(df.head)

   .. code:: shell

      >>> mean_fit_time  std_fit_time  mean_score_time  std_score_time  param_alpha  \
      0       0.003899      0.000194         0.048513        0.007621     0.000977   
      1       0.003401      0.000584         0.042454        0.011295     0.000977   
      2       0.002706      0.000502         0.048544        0.009059     0.000977   
      3       0.003304      0.000531         0.040709        0.003031     0.000977   
      4       0.001801      0.000116         0.000289        0.000014     0.000977   

         param_l1_ratio                                     params  \
      0            0.00   {'alpha': 0.0009765625, 'l1_ratio': 0.0}   
      1            0.25  {'alpha': 0.0009765625, 'l1_ratio': 0.25}   
      2            0.50   {'alpha': 0.0009765625, 'l1_ratio': 0.5}   
      3            0.75  {'alpha': 0.0009765625, 'l1_ratio': 0.75}   
      4            1.00   {'alpha': 0.0009765625, 'l1_ratio': 1.0}   

         split0_test_score  split1_test_score  split2_test_score  mean_test_score  \
      0           0.828571           0.971429           0.971429         0.923810   
      1           0.885714           0.971429           0.942857         0.933333   
      2           0.885714           1.000000           0.942857         0.942857   
      3           0.885714           0.914286           0.914286         0.904762   
      4           0.885714           1.000000           0.942857         0.942857   

         std_test_score  rank_test_score  
      0        0.067344                4  
      1        0.035635                3  
      2        0.046657                1  
      3        0.013469                5  
      4        0.046657                1  

   .. code:: python

      df,params = hyanova.read_df(df,'mean_test_score')
      print(df.head)
      >>>  alpha  l1_ratio  mean_test_score
      0  0.000977      0.00         0.923810
      1  0.000977      0.25         0.933333
      2  0.000977      0.50         0.942857
      3  0.000977      0.75         0.904762
      4  0.000977      1.00         0.942857
      print(params)
      >>> ['alpha', 'l1_ratio']

-  Use ``hyanova.read_csv(path,metric)`` to load data from ``.csv``
   file. The `template
   file <./examples/iris%5BGridSearchCV%5DModel1.csv>`__ can be find at
   the example folder. It is equivalent to
   ``hyanova.read_df(pandas.read_csv(path),metric)``.

ANOVA
'''''

Use ``hyanova.analyze(df)`` to do the functional ANOVA decomposition. It
needs a ``pnadas.DataFrame`` object which has a format similar to the
following table. You can use the methods HyANOVA provides to load data
easily.

== ======= ======== ===============
\  alpha   l1_ratio mean_test_score
== ======= ======== ===============
0  0.00977 0.00     0.923810
1  0.00977 0.25     0.933333
2  0.00977 0.50     0.942857
3  0.00977 0.75     0.904762
== ======= ======== ===============

**Note:** The metric(mean_test_score) should always be in the last
column.

The ``hyanova.analyze(df)`` will return a ``DataFrame`` with
hyperparameters’ name, variance(v_u) and the importance(F_u).

.. code:: python

   importance = hyanova.analyze(df)
   >>> 100%|██████████████████████████████████| 3/3 [00:00<00:00, 11.32it/s]
   print(importance)
   >>>              u       v_u  F_u(v_u/v_all)
   0           (alpha,)  0.056885        0.892057
   1        (l1_ratio,)  0.002489        0.039030
   2  (alpha, l1_ratio)  0.004394        0.068912

**Note:** The F_u is the ratio of the variance caused by the
hyperparameter itself(v_u) to the variance of all trials(v_all), so all
F_u sums always equal to 1.See references for more details.

Example usage

You can use sklearn to do hyperparameters search and then use hyanova to analyze the importance of hyperparameters.

.. code:: python

import sklearn.datasets from sklearn.model_selection import GridSearchCV from sklearn.svm import SVC import pandas as pd import hyanova

iris = sklearn.datasets.load_iris() X = iris.data y = iris.target model = SVC() grid = {'C': np.linspace(1e-9, 128, 10000) 'kernel': ('rbf', 'linear', 'poly', 'sigmoid')} grid_search = GridSearchCV(model,grid) result = grid_search.fit(X, y) df = pd.DataFrame(result.cv_results_) metric = 'mean_test_score' df, params = hyanova.read_df(df,metric) importance = hyanova.analyze(df)

Dependencies


-  numpy
-  pandas
-  tqdm

Why created HyANOVA?

I am completing my undergraduate thesis. In order to better understand the models used in my article, I looked for a lot of algorithms that can measure the importance of hyperparameters. Among them, functional ANOVA seems to be the most effective. But the original author’s implementation is based on java and uses python to call java files, which confuses me. I hope there is a module that is easier to understand and implemented completely based on python, which can help me with ANOVA decomposition, so I created HyANOVA. Hope that will help you too!

References


1. Hutter, F., Hoos, H. & Leyton-Brown, K.. (2014). An Efficient
   Approach for Assessing Hyperparameter Importance. Proceedings of the
   31st International Conference on Machine Learning, in PMLR
   32(1):754-762
2. https://github.com/frank-hutter/fanova

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.1.2

Apr 3, 2020

1.1.1

Apr 3, 2020

1.1.0

Apr 3, 2020

1.0.9

Feb 25, 2020

This version

1.0.8

Feb 25, 2020

1.0.7

Feb 25, 2020

1.0.6

Feb 25, 2020

1.0.5

Feb 25, 2020

1.0.4

Feb 25, 2020

1.0.3

Feb 24, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyanova-1.0.8.zip (14.2 kB view hashes)

Uploaded Feb 25, 2020 Source

Built Distribution

hyanova-1.0.8-py3-none-any.whl (6.0 kB view hashes)

Uploaded Feb 25, 2020 Python 3

Hashes for hyanova-1.0.8.zip

Hashes for hyanova-1.0.8.zip
Algorithm	Hash digest
SHA256	`657c1e2eaf53cdc71ca1cc0c4d016658d26ee1f148dfb7621ab854d0fc8969c2`
MD5	`92adecb783f1b93475293d5f4e3e4285`
BLAKE2b-256	`544239ab4b849f20c770d3e7a4102d1c8742992cd9596e4dd9c2e61a9aa9827f`

Hashes for hyanova-1.0.8-py3-none-any.whl

Hashes for hyanova-1.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e8c57f4a897f0e8139496f26df2552c1b1e13fb03e1f5271686ce1011fff8dc2`
MD5	`61efb4ab6118195366be8c1e51b16c5a`
BLAKE2b-256	`26338e09dd0f3708072d03fc949f5fc9489568b7ae783becc2e494198f2bbbc6`