Benchmarking Likelihood Ratio systems
Project description
LRBenchmark
Repository for benchmarking Likelihood Ratio systems.
Prerequisites
- This repository is developed for Python 3.8.
Dependencies
All dependencies can either be installed by running pip install -r requirements.txt
or pip install .
.
Add new dependencies to the setup.py and always update the requirements by running:
pip-compile --output-file=requirements.txt setup.py
.
Usage
Running the benchmark can be done as follows:
- Specify the parameters for the benchmark in the
lrbenchmark.yaml
- Run
python run.py
The parameters for the benchmark must be provided in the following structure:
experiment:
repeats: 10
scorer:
- 'name scorer 1'
- 'name scorer 2'
calibrator:
- 'name calibrator'
dataset:
- 'name dataset'
preprocessor:
- 'name preprocessor 1'
- 'name preprocessor 2'
At least 1 setting needs to be provided for each parameter, but more settings per parameter can be provided. The pipeline will
create the cartesian product over all parameter settings (except repeats
) and will execute the experiments accordingly.
All possible settings can be found in params.py
. The parameters that need to be set are:
repeats
: Number of repeats for each experiment.scorer
: Scoring models for generating scores.calibrator
: Models for calibrating scores.dataset
: Datasets on which the experiments can be executed.preprocessor
: Data preprocessing steps. You can use the value'dummy'
if no preprocessing is needed.
Example: Benchmark feature rank based LR system
This repository supports several data transformations, such as the possibility to transform X features from values to ranks.
To benchmark these models against models without any transformations on X features, the following experiments (among others) could be
defined in lrbenchmark.yaml
.
experiment:
repeats: 10
scorer:
- 'LR'
- 'XGB'
calibrator:
- 'logit'
dataset:
drugs_xtc:
n_splits: 2
glass:
n_splits: 2
preprocessor:
- 'dummy'
- 'rank_transformer'
When executing python run.py
an experiment for all possible combination of parameters will be executed. The results for each experiment (metrics + plots)
will be stored in separate folders within the output
folder.
Datasets
There are currently two datasets implemented for this project:
- drugs_xtc: will be published on our github soon
- glass: LA-ICPMS measurements of elemental concentration from floatglass. The data will be downloaded automatically from https://github.com/NetherlandsForensicInstitute/elemental_composition_glass when used in the pipeline for the first time.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for lrbenchmark-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32bc30934df6f30ba713271b3946310a77173a4ecf1892cd6807800e028b7918 |
|
MD5 | c27d8b6f6d42d013b7cfcbe30c98c435 |
|
BLAKE2b-256 | 13ffe527125024dde30c25d54086fca4099ee38dfc4fd12bfd9a2c944c9c4194 |