Skip to main content

Speaker Embedding

Project description

WeSpeaker

License Python-Version

Roadmap | Docs | Paper | Runtime | Pretrained Models | Huggingface Demo | Modelscope Demo

WeSpeaker mainly focuses on speaker embedding learning, with application to the speaker verification task. We support online feature extraction or loading pre-extracted features in kaldi-format.

Installation

Install python package

pip install git+https://github.com/wenet-e2e/wespeaker_nuaazs.git

Command-line usage (use -h for parameters):

$ wespeaker --task embedding --audio_file audio.wav --output_file embedding.txt
$ wespeaker --task embedding_kaldi --wav_scp wav.scp --output_file /path/to/embedding
$ wespeaker --task similarity --audio_file audio.wav --audio_file2 audio2.wav
$ wespeaker --task diarization --audio_file audio.wav

Python programming usage:

import wespeaker

model = wespeaker_nuaazs.load_model('chinese')
embedding = model.extract_embedding('audio.wav')
utt_names, embeddings = model.extract_embedding_list('wav.scp')
similarity = model.compute_similarity('audio1.wav', 'audio2.wav')
diar_result = model.diarize('audio.wav')

Please refer to python usage for more command line and python programming usage.

Install for development & deployment

  • Clone this repo
git clone https://github.com/wenet-e2e/wespeaker_nuaazs.git
  • Create conda env: pytorch version >= 1.12.1 is recommended !!!
conda create -n wespeaker python=3.9
conda activate wespeaker
conda install pytorch=1.12.1 torchaudio=0.12.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt
pre-commit install  # for clean and tidy code

🔥 News

Recipes

  • VoxCeleb: Speaker Verification recipe on the VoxCeleb dataset
    • 🔥 UPDATE 2024.05.15: We support score calibration for Voxceleb and achieve better performance!
    • 🔥 UPDATE 2023.07.10: We support self-supervised learning recipe on Voxceleb! Achieving 2.627% (ECAPA_TDNN_GLOB_c1024) EER on vox1-O-clean test set without any labels.
    • 🔥 UPDATE 2022.10.31: We support deep r-vector up to the 293-layer version! Achieving 0.447%/0.043 EER/mindcf on vox1-O-clean test set
    • 🔥 UPDATE 2022.07.19: We apply the same setups as the CNCeleb recipe, and obtain SOTA performance considering the open-source systems
      • EER/minDCF on vox1-O-clean test set are 0.723%/0.069 (ResNet34) and 0.728%/0.099 (ECAPA_TDNN_GLOB_c1024), after LM fine-tuning and AS-Norm
  • CNCeleb: Speaker Verification recipe on the CnCeleb dataset
    • 🔥 UPDATE 2024.05.16: We support score calibration for Cnceleb and achieve better EER.
    • 🔥 UPDATE 2022.10.31: 221-layer ResNet achieves 5.655%/0.330 EER/minDCF
    • 🔥 UPDATE 2022.07.12: We migrate the winner system of CNSRC 2022 report slides
      • EER/minDCF reduction from 8.426%/0.487 to 6.492%/0.354 after large margin fine-tuning and AS-Norm
  • NIST SRE16: Speaker Verification recipe for the 2016 NIST Speaker Recognition Evaluation Plan. Similar recipe can be found in Kaldi.
    • 🔥 UPDATE 2023.07.14: We support NIST SRE16 recipe. After PLDA adaptation, we achieved 6.608%, 10.01%, and 2.974% EER on trial Pooled, Tagalog, and Cantonese, respectively.
  • VoxConverse: Diarization recipe on the VoxConverse dataset

Discussion

For Chinese users, you can scan the QR code on the left to follow our offical account of WeNet Community. We also created a WeChat group for better discussion and quicker response. Please scan the QR code on the right to join the chat group.

Citations

If you find wespeaker useful, please cite it as

@inproceedings{wang2023wespeaker,
  title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
  author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

Looking for contributors

If you are interested to contribute, feel free to contact @wsstriving or @robin1001

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wespeaker_nuaazs-0.0.5.tar.gz (50.3 kB view details)

Uploaded Source

Built Distribution

wespeaker_nuaazs-0.0.5-py3-none-any.whl (70.9 kB view details)

Uploaded Python 3

File details

Details for the file wespeaker_nuaazs-0.0.5.tar.gz.

File metadata

  • Download URL: wespeaker_nuaazs-0.0.5.tar.gz
  • Upload date:
  • Size: 50.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.14

File hashes

Hashes for wespeaker_nuaazs-0.0.5.tar.gz
Algorithm Hash digest
SHA256 f19a136400a4257e2f1ab9c8e9a645fe42078af086c1b0627714456c43f32249
MD5 4f3f98763990733610fe99346c7addc0
BLAKE2b-256 37914420f46e2c12e38a4ab151f67d0c35d467a3a5841bfd2d29e645a5d3537a

See more details on using hashes here.

File details

Details for the file wespeaker_nuaazs-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for wespeaker_nuaazs-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 8bab46156562d0dc6ab5048f72defdeaf80bfb23041af13e3020c71a527ccfa8
MD5 da21fa1feeadb93ef7810342e05ce58c
BLAKE2b-256 e9cd55d0bd04ae5431c707fbbab26ea2d6f78c47f9a726cfea45ac154af926e3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page