Skip to main content

Package for optimising and evaluating Likelihood Ratio (LR) systems.

Project description

lir

Repository for optimising and evaluating Likelihood Ratio (LR) systems. This allows benchmarking of LR systems on different datasets, investigating impact of different sampling schemes or techniques, and doing case-based validation and computation of case LRs.

Prerequisites

  • This repository is developed for Python 3.9.
  • This project uses pdm as a dependency manager.

Installing PDM

For installation instructions, please consult the Installation section on the PDM project website.

Installing project dependencies

Having PDM installed, install all dependencies of the project, run the following command to install the project dependencies.

pdm install

A .venv directory will be created and used by PDM by default to run the python code as defined in the PDM run scripts.

(Optional) Note: You can alternatively select a different python interpreter for PDM using pdm use.

Terminology

  • lr system: An algorithm to calculate likelihood ratios.
  • lr system architecture: A way to compose an LR system, e.g. feature based, specific source, etc.
  • source: Something that can generate instances, or where instances are derived from. This is typically the level relevant to the forensic question and the hypotheses. Examples**: a glass pane, a person whose face may be pictured, a speaker of the voice, a shoe.
  • instance: A single manifestation of a source. Traces and reference samples are instances of a source. In a feature based system, instances are used as the building blocks for modeling hypotheses. Examples**: the measurements on a fragment of glass, a face image, a voice recording, a shoe print.
  • pair: A combination of two groups of instances. The instance groups may be same source or different source. In a common-source system, pairs are used as the building blocks for modeling hypotheses. A group may contain of only one instance, or it may consist of multiple repeated measurements of the same source that are compared as one unit.
  • data set: A set of instances and/or pairs, labeled or unlabeled, that may be used for calculating likelihood ratios.
  • label: The ground-truth value for an instance or a pair. The label may be on the level of the hypothesis (e.g. H1, H2), or on the level of the source (e.g. Speaker1, Speaker2). Hypothesis labels may derived from source labels. In case of a labeled data set, the ground truth (i.e. labels) is known. This will typically be the data that is used for development, analysis or validation of an LR system. Unlabeled data has no ground truth. This will typically be the application data, or case data in a forensic setting.
  • binary data: A data set with exactly two different labels, for specific source evaluation (e.g. H1, H2).
  • multiclass data: A data set with an arbitrary number of labels, for common source evaluation or for reduction to binary data for specific source evaluation.
  • data strategy: The way in which the data are assigned to different applications (validation, calibration, etc.) within the lr system. Well known strategies are train/test split, cross-validation, leave-one-out.
  • data provider: A method for making a data set available, e.g. by reading from disk.
  • run: Calculations for an lr system on a specific data set, as part of an experiment.
  • experiment: A series of one or more runs to calculate lrs, to measure system performance, to evaluate the effect of varying system parameters, or to optimize system parameters.
  • experiment strategy: A strategy for specifying system parameter values, e.g. single run, grid search, etc.

Usage

This repository offers both a Python API and a command-line interface.

Command-line interface

Evaluate an LR system using the command-line interface as follows:

  1. define your data, LR system and experiments in a YAML file;
  2. run pdm run benchmark --setup <yaml file>.

The examples folder may be a good starting point for setting up an experiment.

The elements of the experiment configuration YAML are looked up in the registry. The following lists all available elements in the registry.

pdm run benchmark --list-registry

Datasets

There are currently a number of datasets implemented for this project:

Simulations

It is straightforward to simulate data for experimentation. Currently two very simple simulations synthesized_normal_binary and synthesized_normal_multiclass are available, with sources and measurements drawn from normal distributions.

Development

Setting up git pre-commit hook

To run all checks before committing, you can add a git pre-commit hook which ensures all checks and balances are green before making a new commit.

Copy the pre-commit.example file to the .git/hooks folder within this project and rename it to pre-commit. Next, make sure the pre-commit file is executable. You can run the following shell commands in the (PyCharm) terminal from the root of the project:

cp pre-commit.example .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit

Adding new dependencies

New dependencies should be installed through pdm add <dependency_name>.

When developing locally, the following PDM scripts can be employed:

  • Run linting / formatting / static analysis: pdm run check
  • Run tests: pdm run test
  • Run all checks and tests: pdm run all

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lir-1.0.0-py3-none-any.whl (79.6 kB view details)

Uploaded Python 3

File details

Details for the file lir-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: lir-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 79.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lir-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 84e9e64e1c5e626957d5fb438600e581f397022b160a6d36e088d5fb193c3cbe
MD5 2e82282c1bfabbd51e0b2275f747e44b
BLAKE2b-256 90284763768c2ae8bc4ab0c2bf824fa7efd006e3e0de6b31d25322ea30fa1f3b

See more details on using hashes here.

Provenance

The following attestation bundles were made for lir-1.0.0-py3-none-any.whl:

Publisher: release.yaml on NetherlandsForensicInstitute/lir

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page