Skip to main content

3D-Speaker is an open-source toolkit for single- and multi-modal speaker verification, speaker recognition, and speaker diarization.

Project description

Introduction

This package is derived from https://github.com/modelscope/3D-Speaker, the core of the project folder speakerlab is not a standard package, this project will speakerlab separately packaged and distributed.



license

3D-Speaker is an open-source toolkit for single- and multi-modal speaker verification, speaker recognition, and speaker diarization. All pretrained models are accessible on ModelScope. Furthermore, we present a large-scale speech corpus also called 3D-Speaker-Dataset to facilitate the research of speech representation disentanglement.

Benchmark

The EER results on VoxCeleb, CNCeleb and 3D-Speaker datasets for fully-supervised speaker verification.

Model Params VoxCeleb1-O CNCeleb 3D-Speaker
Res2Net 4.03 M 1.56% 7.96% 8.03%
ResNet34 6.34 M 1.05% 6.92% 7.29%
ECAPA-TDNN 20.8 M 0.86% 8.01% 8.87%
ERes2Net-base 6.61 M 0.84% 6.69% 7.21%
CAM++ 7.2 M 0.65% 6.78% 7.75%
ERes2NetV2 17.8M 0.61% 6.14% 6.52%
ERes2Net-large 22.46 M 0.52% 6.17% 6.34%

The DER results on public and internal multi-speaker datasets for speaker diarization.

Test 3D-Speaker pyannote.audio DiariZen_WavLM
Aishell-4 10.30% 12.2% 11.7%
Alimeeting 19.73% 24.4% 17.6%
AMI_SDM 21.76% 22.4% 15.4%
VoxConverse 11.75% 11.3% 28.39%
Meeting-CN_ZH-1 18.91% 22.37% 32.66%
Meeting-CN_ZH-2 12.78% 17.86% 18%

Quickstart

Install 3D-Speaker

git clone https://github.com/modelscope/3D-Speaker.git && cd 3D-Speaker
conda create -n 3D-Speaker python=3.8
conda activate 3D-Speaker
pip install -r requirements.txt

Running experiments

# Speaker verification: ERes2NetV2 on 3D-Speaker dataset
cd egs/3dspeaker/sv-eres2netv2/
bash run.sh
# Speaker verification: CAM++ on 3D-Speaker dataset
cd egs/3dspeaker/sv-cam++/
bash run.sh
# Speaker verification: ECAPA-TDNN on 3D-Speaker dataset
cd egs/3dspeaker/sv-ecapa/
bash run.sh
# Self-supervised speaker verification: SDPN on VoxCeleb dataset
cd egs/voxceleb/sv-sdpn/
bash run.sh
# Audio and multimodal Speaker diarization:
cd egs/3dspeaker/speaker-diarization/
bash run_audio.sh
bash run_video.sh
# Language identification
cd egs/3dspeaker/language-idenitfication
bash run.sh

Inference using pretrained models from Modelscope

All pretrained models are released on Modelscope.

# Install modelscope
pip install modelscope
# ERes2Net trained on 200k labeled speakers
model_id=iic/speech_eres2net_sv_zh-cn_16k-common
# ERes2NetV2 trained on 200k labeled speakers
model_id=iic/speech_eres2netv2_sv_zh-cn_16k-common
# CAM++ trained on 200k labeled speakers
model_id=iic/speech_campplus_sv_zh-cn_16k-common
# Run CAM++ or ERes2Net inference
python speakerlab/bin/infer_sv.py --model_id $model_id
# Run batch inference
python speakerlab/bin/infer_sv_batch.py --model_id $model_id --wavs $wav_list

# SDPN trained on VoxCeleb
model_id=iic/speech_sdpn_ecapa_tdnn_sv_en_voxceleb_16k
# Run SDPN inference
python speakerlab/bin/infer_sv_ssl.py --model_id $model_id

# Run diarization inference
python speakerlab/bin/infer_diarization.py --wav [wav_list OR wav_path] --out_dir $out_dir
# Enable overlap detection
python speakerlab/bin/infer_diarization.py --wav [wav_list OR wav_path] --out_dir $out_dir --include_overlap --hf_access_token $hf_access_token

Overview of Content

What‘s new :fire:

Contact

If you have any comment or question about 3D-Speaker, please contact us by

  • email: {yfchen97, wanghuii}@mail.ustc.edu.cn, {dengchong.d, zsq174630, shuli.cly}@alibaba-inc.com

License

3D-Speaker is released under the Apache License 2.0.

Acknowledge

3D-Speaker contains third-party components and code modified from some open-source repos, including:
Speechbrain, Wespeaker, D-TDNN, DINO, Vicreg, TalkNet-ASD , Ultra-Light-Fast-Generic-Face-Detector-1MB, pyannote.audio

Citations

If you find this repository useful, please consider giving a star :star: and citation :t-rex::

@article{chen20243d,
  title={3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization},
  author={Chen, Yafeng and Zheng, Siqi and Wang, Hui and Cheng, Luyao and others},
  booktitle={ICASSP},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speakerlab-0.0.3-py3-none-any.whl (123.5 kB view details)

Uploaded Python 3

File details

Details for the file speakerlab-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: speakerlab-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 123.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for speakerlab-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 19968d8d695bbcbb7eb7d8c3b3877992756d543369bd4c3525e6ba4ba695eaa9
MD5 ccc39f7937c7f8304ce70bdb66236bee
BLAKE2b-256 2c31999cd9919a0cb372a6e39d0c9a6acfdb944b8dc2331b29c36d9e83ff6226

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page