Significance Analysis for HPO-algorithms performing on multiple benchmarks
Project description
Significance Analysis
This package is used to analyse datasets of different HPO-algorithms performing on multiple benchmarks, using a Linear Mixed-Effects Model-based approach.
Note
As indicated with the v0.x.x
version number, Significance Analysis is early stage code and APIs might change in the future.
Documentation
For an interactive overview, please have a look at our example.
Every dataset should be a pandas dataframe of the following format:
algorithm | benchmark | metric | optional: budget/prior/... |
---|---|---|---|
Algorithm1 | Benchmark1 | 3.141 | 1.0 |
Algorithm1 | Benchmark1 | 6.283 | 2.0 |
Algorithm1 | Benchmark2 | 2.718 | 1.0 |
... | ... | ... | ... |
Algorithm2 | Benchmark2 | 0.621 | 2.0 |
As it is used to train a model, there can not be missing values, but duplicates are allowed.
Our function dataset_validator
checks for this format.
Installation
Using R, >=4.0.0 install packages: Matrix, emmeans, lmerTest and lme4
Using pip
pip install significance-analysis
Usage for significance testing
- Generate data from HPO-algorithms on benchmarks, saving data according to our format.
- Build a model with all interesting factors
- Do post-hoc testing
- Plot the results as CD-diagram
In code, the usage pattern can look like this:
import pandas as pd
from significance_analysis import dataframe_validator, model, cd_diagram
# 1. Generate/import dataset
data = dataframe_validator(pd.read_parquet("datasets/priorband_data.parquet"))
# 2. Build the model
mod = model("value ~ algorithm + (1|benchmark) + prior", data)
# 3. Conduct the post-hoc analysis
post_hoc_results = mod.post_hoc("algorithm")
# 4. Plot the results
cd_diagram(post_hoc_results)
Usage for hypothesis testing
Use the GLRT implementation or our prepared sanity checks
to conduct LMEM-based hypothesis testing.
In code:
from significance_analysis import (
dataframe_validator,
glrt,
model,
seed_dependency_check,
benchmark_information_check,
fidelity_check,
)
# 1. Generate/import dataset
data = dataframe_validator(pd.read_parquet("datasets/priorband_data.parquet"))
# 2. Run the preconfigured sanity checks
seed_dependency_check(data)
benchmark_information_check(data)
fidelity_check(data)
# 3. Run a custom hypothesis test, comparing model_1 and model_2
model_1 = model("value ~ algorithm", data)
model_2 = model("value ~ 1", data)
glrt(model_1, model_2)
Usage for metafeature impact analysis
Analyzing the influence, a metafeature has on two algorithms performances.
In code:
from significance_analysis import dataframe_validator, metafeature_analysis
# 1. Generate/import dataset
data = dataframe_validator(pd.read_parquet("datasets/priorband_data.parquet"))
# 2. Run the metafeature analysis
scores = metafeature_analysis(data, ("HB", "PB"), "prior")
For more details and features please have a look at our example.
Contributing
We welcome contributions from everyone, feel free to raise issues or submit pull requests.
To cite the paper or code
@misc{geburek2024lmemsposthocanalysishpo,
title={LMEMs for post-hoc analysis of HPO Benchmarking},
author={Anton Geburek and Neeratyoy Mallik and Danny Stoll and Xavier Bouthillier and Frank Hutter},
year={2024},
eprint={2408.02533},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file significance_analysis-0.2.4.tar.gz
.
File metadata
- Download URL: significance_analysis-0.2.4.tar.gz
- Upload date:
- Size: 14.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.10.0 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bd1784870849360230528c3a9b0f70bda61e8084fe313e9261f31a53f15db07 |
|
MD5 | 17663078328cedf027a228a644e02271 |
|
BLAKE2b-256 | d70adae02ba72dfd9561af64ab173c2cb08957f6f0f9968f5374a689f5e126ef |
File details
Details for the file significance_analysis-0.2.4-py3-none-any.whl
.
File metadata
- Download URL: significance_analysis-0.2.4-py3-none-any.whl
- Upload date:
- Size: 11.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.10.0 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f013cf5c9fa95a13483400b3e04f09585dca5b80d40686612c75f12e0016ccb8 |
|
MD5 | e24a34f4fe4f7822bea2641c44c5dc60 |
|
BLAKE2b-256 | 7ca39e179649decbcd53e4620add74921f3e1e89a3bcea48114b9825dfe1c442 |