Benchmarking Likelihood Ratio systems
Project description
LRBenchmark
Repository for benchmarking Likelihood Ratio systems.
Prerequisites
- This repository is developed for Python 3.8.
Dependencies
All dependencies can either be installed by running pip install -r requirements.txt
or pip install .
.
Add new dependencies to the setup.py and always update the requirements by running:
pip-compile --output-file=requirements.txt setup.py
.
Usage
Running the benchmark can be done as follows:
- Specify the parameters for the benchmark in the
lrbenchmark.yaml
- Run
python run.py
The parameters for the benchmark must be provided in the following structure:
experiment:
repeats: 10
scorer:
- 'name scorer 1'
- 'name scorer 2'
calibrator:
- 'name calibrator'
dataset:
- 'name dataset'
preprocessor:
- 'name preprocessor 1'
- 'name preprocessor 2'
At least 1 setting needs to be provided for each parameter, but more settings per parameter can be provided. The pipeline will
create the cartesian product over all parameter settings (except repeats
) and will execute the experiments accordingly.
All possible settings can be found in params.py
. The parameters that need to be set are:
repeats
: Number of repeats for each experiment.scorer
: Scoring models for generating scores.calibrator
: Models for calibrating scores.dataset
: Datasets on which the experiments can be executed.preprocessor
: Data preprocessing steps. You can use the value'dummy'
if no preprocessing is needed.
Example: Benchmark feature rank based LR system
This repository supports several data transformations, such as the possibility to transform X features from values to ranks.
To benchmark these models against models without any transformations on X features, the following experiments (among others) could be
defined in lrbenchmark.yaml
.
experiment:
repeats: 10
scorer:
- 'LR'
- 'XGB'
calibrator:
- 'logit'
dataset:
drugs_xtc:
n_splits: 2
glass:
n_splits: 2
preprocessor:
- 'dummy'
- 'rank_transformer'
When executing python run.py
an experiment for all possible combination of parameters will be executed. The results for each experiment (metrics + plots)
will be stored in separate folders within the output
folder.
Datasets
There are currently two datasets implemented for this project:
- drugs_xtc: will be published on our github soon
- glass: LA-ICPMS measurements of elemental concentration from floatglass. The data will be downloaded automatically from https://github.com/NetherlandsForensicInstitute/elemental_composition_glass when used in the pipeline for the first time.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for lrbenchmark-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35eb5cce45a76463dc4d4945866d103816c6275e47cb56641ca0d2bba60ea961 |
|
MD5 | cde537d17ae2bba6074c5a46b016c11a |
|
BLAKE2b-256 | 6e874c34fc7804dccf18e8c4c3bc7907c762a8d705714dc6a1c0b13aa4046de8 |