Skip to main content

A lightweight library to compute Diarization Error Rate (DER).

Project description

SimpleDER Python package PyPI Version Python Versions Downloads codecov Documentation

Table of Contents

Overview

This is a single-file lightweight library to compute Diarization Error Rate (DER).

Features NOT supported in this library:

  • A full breakdown of the different components of DER (False Alarm, Miss, Overlap, Confusion).

For more sophisticated metrics with this support, please use pyannote-metrics instead.

To learn more about speaker diarization, here is a curated list of resources: awesome-diarization.

Diarization Error Rate

Diarization Error Rate (DER) is the most commonly used metrics for speaker diarization.

Its strict form is:

       False Alarm + Miss + Overlap + Confusion
DER = ------------------------------------------
                   Reference Length

The definition of each term:

  • Reference Length: The total length of the reference (ground truth).
  • False Alarm: Length of segments which are considered as speech in hypothesis, but not in reference.
  • Miss: Length of segments which are considered as speech in reference, but not in hypothesis.
  • Overlap: Length of segments which are considered as overlapped speech in hypothesis, but not in reference.
  • Confusion: Length of segments which are assigned to different speakers in hypothesis and reference (after applying an optimal assignment).

The unit of each term is seconds.

Note that DER can theoretically be larger than 1.0.

References:

Implementation

This library allows efficient computation of DER including support for overlapped speech.

The algorithm works as follows:

  1. Collar Pre-processing (if collar > 0): We remove regions around reference boundaries from both the reference and the hypothesis. For every start/end time $t$ in the reference, the interval $[t - \text{collar}, t + \text{collar}]$ is excluded from scoring.

  2. Optimal Mapping: We first align the speakers in the hypothesis to the reference by maximizing the total overlap duration between them. This is a linear sum assignment problem (also known as the weighted bipartite matching problem), which we solve using the Hungarian algorithm (via scipy.optimize.linear_sum_assignment). Let Match be the total overlap duration of this optimal mapping.

  3. Load Calculation: We calculate a value called "Load", representing the total duration of speech that requires being matched. This accounts for overlapped speech.

    Mathematically: Load = $\int \max(N_{\text{ref}}(t), N_{\text{hyp}}(t)) dt$ where $N_{\text{ref}}(t)$ and $N_{\text{hyp}}(t)$ are the number of active speakers at time $t$ in reference and hypothesis, respectively.

  4. DER Calculation: DER = (Load - Match) / Reference Length

    This formulation is mathematically equivalent to the standard definition (Miss + False Alarm + Confusion) / Reference Length.

Tutorial

Install

Install the package by:

pip3 install simpleder

or

python3 -m pip install simpleder

API

Here is a minimal example:

import simpleder

# reference (ground truth)
ref = [("A", 0.0, 1.0),
       ("B", 1.0, 1.5),
       ("A", 1.6, 2.1)]

# hypothesis (diarization result from your algorithm)
hyp = [("1", 0.0, 0.8),
       ("2", 0.8, 1.4),
       ("3", 1.5, 1.8),
       ("1", 1.8, 2.0)]

error = simpleder.DER(ref, hyp)

print("DER={:.3f}".format(error))

This should output:

DER=0.350

Citation

We developed this package as part of the following work:

@inproceedings{wang2018speaker,
  title={{Speaker Diarization with LSTM}},
  author={Wang, Quan and Downey, Carlton and Wan, Li and Mansfield, Philip Andrew and Moreno, Ignacio Lopz},
  booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5239--5243},
  year={2018},
  organization={IEEE}
}

@inproceedings{xia2022turn,
  title={{Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection}},
  author={Wei Xia and Han Lu and Quan Wang and Anshuman Tripathi and Yiling Huang and Ignacio Lopez Moreno and Hasim Sak},
  booktitle={2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={8077--8081},
  year={2022},
  organization={IEEE}
}

@inproceedings{huang24d_interspeech,
  title={{On the Success and Limitations of Auxiliary Network Based Word-Level End-to-End Neural Speaker Diarization}},
  author={Yiling Huang and Weiran Wang and Guanlong Zhao and Hank Liao and Wei Xia and Quan Wang},
  year={2024},
  booktitle={Interspeech 2024},
  pages={32--36},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simpleder-0.0.5.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simpleder-0.0.5-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file simpleder-0.0.5.tar.gz.

File metadata

  • Download URL: simpleder-0.0.5.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.1

File hashes

Hashes for simpleder-0.0.5.tar.gz
Algorithm Hash digest
SHA256 dd4d0b52fe1c15cef1a7bb5cd0b568ddde1d11d05516b27b0d50704cf6ce81fe
MD5 5a384c801646552569c310ab0b900dc5
BLAKE2b-256 9980258b94c7c300f17edb65081fdfd77c5c31ab8e78bc4dd2c5df1d01c8c462

See more details on using hashes here.

File details

Details for the file simpleder-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: simpleder-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.1

File hashes

Hashes for simpleder-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a9483825a029d4a747fde5b5a18a0766afeac797693a7ff98c221d4cede19f5d
MD5 ee68b4dd431351fbf89de6d264670b9c
BLAKE2b-256 34da3a4a10353b7a9566cb1f79126499b3fb0beeeec8d696e73612a5df66d664

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page