A lightweight library to compute Diarization Error Rate (DER).
Project description
SimpleDER

Table of Contents
Overview
This is a single-file lightweight library to compute Diarization Error Rate (DER).
Features NOT supported in this library:
- A full breakdown of the different components of DER (False Alarm, Miss, Overlap, Confusion).
For more sophisticated metrics with this support, please use pyannote-metrics instead.
To learn more about speaker diarization, here is a curated list of resources: awesome-diarization.
Diarization Error Rate
Diarization Error Rate (DER) is the most commonly used metrics for speaker diarization.
Its strict form is:
False Alarm + Miss + Overlap + Confusion
DER = ------------------------------------------
Reference Length
The definition of each term:
Reference Length:The total length of the reference (ground truth).False Alarm: Length of segments which are considered as speech in hypothesis, but not in reference.Miss: Length of segments which are considered as speech in reference, but not in hypothesis.Overlap: Length of segments which are considered as overlapped speech in hypothesis, but not in reference.Confusion: Length of segments which are assigned to different speakers in hypothesis and reference (after applying an optimal assignment).
The unit of each term is seconds.
Note that DER can theoretically be larger than 1.0.
References:
Implementation
This library allows efficient computation of DER including support for overlapped speech.
The algorithm works as follows:
-
Collar Pre-processing (if
collar > 0): We remove regions around reference boundaries from both the reference and the hypothesis. For every start/end time $t$ in the reference, the interval $[t - \text{collar}, t + \text{collar}]$ is excluded from scoring. -
Optimal Mapping: We first align the speakers in the hypothesis to the reference by maximizing the total overlap duration between them. This is a linear sum assignment problem (also known as the weighted bipartite matching problem), which we solve using the Hungarian algorithm (via
scipy.optimize.linear_sum_assignment). LetMatchbe the total overlap duration of this optimal mapping. -
Load Calculation: We calculate a value called "Load", representing the total duration of speech that requires being matched. This accounts for overlapped speech.
Mathematically:
Load= $\int \max(N_{\text{ref}}(t), N_{\text{hyp}}(t)) dt$ where $N_{\text{ref}}(t)$ and $N_{\text{hyp}}(t)$ are the number of active speakers at time $t$ in reference and hypothesis, respectively. -
DER Calculation:
DER = (Load - Match) / Reference LengthThis formulation is mathematically equivalent to the standard definition
(Miss + False Alarm + Confusion) / Reference Length.
Tutorial
Install
Install the package by:
pip3 install simpleder
or
python3 -m pip install simpleder
API
Here is a minimal example:
import simpleder
# reference (ground truth)
ref = [("A", 0.0, 1.0),
("B", 1.0, 1.5),
("A", 1.6, 2.1)]
# hypothesis (diarization result from your algorithm)
hyp = [("1", 0.0, 0.8),
("2", 0.8, 1.4),
("3", 1.5, 1.8),
("1", 1.8, 2.0)]
error = simpleder.DER(ref, hyp)
print("DER={:.3f}".format(error))
This should output:
DER=0.350
Citation
We developed this package as part of the following work:
@inproceedings{wang2018speaker,
title={{Speaker Diarization with LSTM}},
author={Wang, Quan and Downey, Carlton and Wan, Li and Mansfield, Philip Andrew and Moreno, Ignacio Lopz},
booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={5239--5243},
year={2018},
organization={IEEE}
}
@inproceedings{xia2022turn,
title={{Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection}},
author={Wei Xia and Han Lu and Quan Wang and Anshuman Tripathi and Yiling Huang and Ignacio Lopez Moreno and Hasim Sak},
booktitle={2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={8077--8081},
year={2022},
organization={IEEE}
}
@inproceedings{huang24d_interspeech,
title={{On the Success and Limitations of Auxiliary Network Based Word-Level End-to-End Neural Speaker Diarization}},
author={Yiling Huang and Weiran Wang and Guanlong Zhao and Hank Liao and Wei Xia and Quan Wang},
year={2024},
booktitle={Interspeech 2024},
pages={32--36},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simpleder-0.0.5.tar.gz.
File metadata
- Download URL: simpleder-0.0.5.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd4d0b52fe1c15cef1a7bb5cd0b568ddde1d11d05516b27b0d50704cf6ce81fe
|
|
| MD5 |
5a384c801646552569c310ab0b900dc5
|
|
| BLAKE2b-256 |
9980258b94c7c300f17edb65081fdfd77c5c31ab8e78bc4dd2c5df1d01c8c462
|
File details
Details for the file simpleder-0.0.5-py3-none-any.whl.
File metadata
- Download URL: simpleder-0.0.5-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9483825a029d4a747fde5b5a18a0766afeac797693a7ff98c221d4cede19f5d
|
|
| MD5 |
ee68b4dd431351fbf89de6d264670b9c
|
|
| BLAKE2b-256 |
34da3a4a10353b7a9566cb1f79126499b3fb0beeeec8d696e73612a5df66d664
|