Hyper parameter optimization extension for ASReview
Project description
ASReview-hyperopt
Hyper parameter optimization extension for ASReview. It uses the hyperopt package to quickly optimize parameters of the different models. The hyper parameters and their sample space are defined in the ASReview package, and automatically used for hyper parameter optimization.
Installation
The easiest way to install the visualization package is to use the command line:
pip install asreview-hyperopt
After installation of the visualization package, asreview should automatically detect it. Test this by:
asreview --help
It should list three new entry points: hyper-active
, hyper-passive
and hyper-cluster
.
Basic usage
The three entry-points are used in a roughly similar fashion. The main difference between them is the types of models that have to be supplied:
- hyper-cluster: feature_extraction
- hyper-passive: model, balance_strategy, feature_extraction
- hyper-active: model, balance_strategy, query_strategy, feature_extraction
To get help for entry points type:
asreview hyper-active --help
Which results in the following options:
usage: /Users/qubix/Library/Python/3.6/bin/asreview [-h] [-m MODEL]
[-q QUERY_STRATEGY]
[-b BALANCE_STRATEGY]
[-e FEATURE_EXTRACTION]
[-n N_ITER] [-d DATASETS]
[--mpi]
optional arguments:
-h, --help show this help message and exit
-m MODEL, --model MODEL
Prediction model for active learning.
-q QUERY_STRATEGY, --query_strategy QUERY_STRATEGY
Query strategy for active learning.
-b BALANCE_STRATEGY, --balance_strategy BALANCE_STRATEGY
Balance strategy for active learning.
-e FEATURE_EXTRACTION, --feature_extraction FEATURE_EXTRACTION
Feature extraction method.
-n N_ITER, --n_iter N_ITER
Number of iterations of Bayesian Optimization.
-d DATASETS, --datasets DATASETS
Datasets to use in the hyper parameter optimization
Separate by commas to use multiple at the same time
[default: all].
--mpi Use the mpi implementation.
Data structure
The extension will search for datasets in the data
directory, relative to the current
working directory, so put your datasets there.
The output of the runs will be stored in the output
directory, again relative to the current path.
An example of a structure that has been created:
output/
├── active_learning
│ ├── nb_max_double_tfidf
│ │ └── depression_hall_ace_ptsd_nagtegaal
│ │ ├── best
│ │ │ ├── ace
│ │ │ ├── depression
│ │ │ ├── hall
│ │ │ ├── nagtegaal
│ │ │ └── ptsd
│ │ ├── current
│ │ │ ├── ace
│ │ │ ├── depression
│ │ │ ├── hall
│ │ │ ├── nagtegaal
│ │ │ └── ptsd
│ │ └── trials.pkl
│ └── nb_max_random_double_tfidf
│ └── nagtegaal
│ ├── best
│ │ └── nagtegaal
│ ├── current
│ │ └── nagtegaal
│ └── trials.pkl
├── cluster
│ └── doc2vec
│ ├── ace
│ │ ├── best
│ │ │ └── ace
│ │ ├── current
│ │ │ └── ace
│ │ └── trials.pkl
│ ├── depression_hall_ace_ptsd_nagtegaal
│ │ └── current
│ │ ├── ace
│ │ ├── depression
│ │ ├── hall
│ │ ├── nagtegaal
│ │ └── ptsd
│ └── nagtegaal
│ └── current
│ └── nagtegaal
└── passive
└── nb_double_tfidf
└── depression
├── best
│ └── depression
├── current
│ └── depression
└── trials.pkl
The files with name trials.pkl
are special files that contain data on which trials were run.
To list these trials, use the following command:
asreview show $SOME_DIRECTORY/trials.pkl
It should give a list of trials sorted by the loss (lower is better). The column names (apart from the loss) are prefixed with the kind of parameter it is:
mdl
: Model parameterbal
: Balance strategy parameterqry
: Query strategy parameterfex
: Feature extraction parameter
Options
The default number of iterations is 1, which you'll probably want to increase. It depends on the number of hyper-parameters that need to be optimized, but several hundred iterations is probably a good estimate for most combinations to get reasonably close to the optimum. In all cases, use good common sense; if the loss is still going down at a quick pace, do a few more iterations.
The hyperopt extension has built-in support for MPI. MPI is used for parallelization of runs. On a local PC with an MPI-implementation (like OpenMPI) installed, one could run with 4 cores:
mpirun -n 4 asreview hyper-active
On super computers one should sometimes replace mpirun
with srun
.
Time measurements:
inactive
nb, tfidf, double, max -> 53 seconds svm, tfidf, double, max -> 1940 seconds rf, tfidf, double, max -> 80 seconds logistic, tfidf, double, max -> 250 seconds /4 dense_nn, tfidf, double, max -> ? dense_nn, doc2vec, double, max -> 2750 seconds /1, /2 svm, doc2vec, ...
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for asreview_hyperopt-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c9fa85ff59c2fed7cd5f0b78345afbdb42be24d38d22d70b247e3ad00c86908 |
|
MD5 | ab95ef6d0ab055fa6efab198369b8c66 |
|
BLAKE2b-256 | 5a3cf35c433a4d14df8781f357df336db7c380aa7f278b0717ab50aa4579d6c6 |