Skip to main content

Bias Tests for Voice Technologies

Project description

Bias Tests for Voice Technologies (bt4vt)

About this package

bt4vt is a python library to diagnose performance discrepancies (i.e. bias) in speaker verification models. The library provides evaluation measures and visualisations to interrogate model performance and can be integrated into development pipelines to test for bias. We plan to extend the library to other speech processing tasks in future. Speak to us if you're interested to help.

Read the docs

The development of this open source library is part of the Fair EVA project and has been supported by the Mozilla Technology Fund.

Setup instructions

You need python 3 to use this library. The easiest way to use the library is to install it with pip.

$ pip install bt4vt

To use the library in development mode, install it as follows:

  1. Clone this repository from github and navigate to the project's root directory (bt4vt\)

    $ git clone https://github.com/wiebket/bt4vt.git
    
  2. Install the project.

    $ pip install -e .
    

Usage

Below is an example for using bt4vt. All necessary files can be copied by using copy_example(). The example evaluates the fairness of models released with the Clova AI VoxCeleb Trainer.

Run Bias Tests for Speaker Verification

1. Copy example resources

All files that are necessary to reproduce the example can be copied to a folder of your choice. Here, we copy the resources to ~/bias_tests_4_voice_tech/example/.

    import bt4vt

    bt4vt.dataio.copy_example("~/bias_tests_4_voice_tech/example/")

1. Create config file

A template for the config.yaml file is now provided in the ~/bias_tests_4_voice_tech/example/ folder. If you copied the files to a different folder you need to adjust the path to the speaker_metadata_file and results_dir.

    speaker_metadata_file: "~/bias_tests_4_voice_tech/example/vox1_meta.csv"
    results_dir: "~/bias_tests_4_voice_tech/results/"

    # for metadata
    id_column: "VoxCeleb1 ID"
    select_columns: ["Gender", "Nationality"]
    speaker_groups: [["Gender"], ["Nationality"], ["Gender", "Nationality"]]

    # for scores
    reference_filepath_column: "ref_file"
    test_filepath_column: "com_file"
    label_column: "lab"
    scores_column: "sc"

    # for dataset evaluation
    dataset_evaluation: True

    # for run_tests
    dcf_costs: [[0.05, 1, 1]]

2. Run the bias tests

Import bt4vt and specify your score and config file. Pass the score and config file path to the SpeakerBiastTest class and run the run_tests() function.

score_file = "~/bias_tests_4_voice_tech/example/resnetse34v2_H-eval_scores.csv"
config_file = "~/bias_tests_4_voice_tech/example/config.yaml"

test = bt4vt.core.SpeakerBiasTest(score_file, config_file)

test.run_tests()

Test results will be stored in ~/bias_tests_4_voice_tech/results. The results file contains metrics ratios for the metrics and speaker groups specified in the config file.

The metrics ratio is calculated as speaker group metric / average metric.

Under Development

The project is under continuous development and we appreciate contributions! Planned enhancements include:

  • advanced plotting of test results
  • implementation of further metrics and fairness measures
  • inclusive evaluation dataset generators

If you'd like to get involved, have a look at: https://www.faireva.org/get-involved

Resources

An early versions of this library was developed as part of the following research:

Wiebke Toussaint Hutiri and Aaron Yi Ding. 2022. Bias in Automated Speaker Recognition. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22). Association for Computing Machinery, New York, NY, USA, 230–247. https://doi-org.tudelft.idm.oclc.org/10.1145/3531146.3533089

License

This code is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This software is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for details.

You should have received a copy of the GNU General Public License along with this source code. If not, go the following link: http://www.gnu.org/licenses/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bt4vt-1.0.1.tar.gz (23.0 MB view details)

Uploaded Source

Built Distribution

bt4vt-1.0.1-py3-none-any.whl (23.7 MB view details)

Uploaded Python 3

File details

Details for the file bt4vt-1.0.1.tar.gz.

File metadata

  • Download URL: bt4vt-1.0.1.tar.gz
  • Upload date:
  • Size: 23.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.5

File hashes

Hashes for bt4vt-1.0.1.tar.gz
Algorithm Hash digest
SHA256 bea5a706411962ca843608a1e1f26a26cb607ed31ddd897ddd455a2e02b36a04
MD5 6da83ffdbed65db5ddfadceb3bf662f9
BLAKE2b-256 b51d9b8e612b36640b4891f9925b2bea5908ed0ed84e136dc35b40a4cd382fdb

See more details on using hashes here.

File details

Details for the file bt4vt-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: bt4vt-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 23.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.5

File hashes

Hashes for bt4vt-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8d6b2b8b054066ed5c58f2865dad209ca46f11cf21b4cadc07db1e116c44bd3e
MD5 a076aed12e5820152e975c0647cc4d74
BLAKE2b-256 4b959ba1c0f87d5805606fc2bf341b8f33b329f3e24dfe7a263f3093a0a62bf9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page