No project description provided
Project description
WeSpeaker Python Binding
This is a python binding of WeSpeaker.
WeSpeaker is a production first and production ready end-to-end speaker recognition toolkit.
- Two onnx models are available: voxceleb model, cnceleb_model
- Extract embedding from wav file or feature(Fbank/MFCC).
- Support using
kaldiio
to save embedding.
Install
Python 3.6+ is required.
pip3 install wespeakerruntime
Usage
Extract embedding from wav file
import sys
import wespeakerruntime as wespeaker
wav_file = sys.argv[1]
speaker = wespeaker.Speaker(lang='chs')
ans = speaker.extract_embedding(wav_file)
print(ans)
You can also specify the following parameter in wespeaker.Speaker
onnx_path
(str, optional): is the path ofonnx model
.- Default: onnx model will be downloaded from the server.
lang
(str):chs
for cnceleb_model.en
for voxceleb model.inter_op_num_threads
andintra_op_num_threads
(int): the number of threads during the model runing. For details, please see: https://onnxruntime.ai/docs/
The parameters of extract_embedding
wav_path
(str): the path of wavresample_rate
(int): resampling rate. Default: 16000num_mel_bins
(int): dimension of fbank. Default: 80frame_length
(int): frame length. Default: 25frame_shift
(int): frame shift. Default: 10cmn
(bool): if true, cepstrum average normalization is applied. Default: True
Compute cosine score
import wespeakerruntime as wespeaker
speaker = wespeaker.Speaker(lang='chs')
emb1 = speaker.extract_embedding_wav(wav1_path)
emb2 = speaker.extract_embedding_wav(wav2_path)
score = speaker.compute_cosine_score(emb1, emb2)
The parameters of compute_cosine_score
:
emb1
(numpy.ndarray): embedding of speaker-1emb2
(numpy.ndarray): embedding of speaker-2
[Optional] Extract embedding from feature(fbank/MFCC)
import sys
import wespeakerruntime as wespeaker
feat = your_fbank
speaker = wespeaker.Speaker(lang='chs')
ans = speaker.extract_embedding_feat(feat)
print(ans)
The parameters of extract_embedding_feat
:
feats
(numpy.ndarray): the shape is [B, T, D].cmn
(bool): if true, cepstrum average normalization is applied. Default: True
[Optional] Extract embedding from wav.scp
import sys
import wespeakerruntime as wespeaker
wav_scp = sys.argv[1]
speaker = wespeaker.Speaker(lang='chs')
speaker.extract_embedding_kaldiio(wav_scp, 'embed.ark')
The parameters of extract_embedding_kaldiio
:
wav_path
(str): the path of wavembed_ark
(str): the path of$ouput
.arkresample_rate
(int): resampling rate. Default: 16000num_mel_bins
(int): dimension of fbank. Default: 80frame_length
(int): frame length. Default: 25frame_shift
(int): frame shift. Default: 10cmn
(bool): if true, cepstrum average normalization is applied. Default: True
Build on Your Local Machine
git clone git@github.com:wenet-e2e/wespeaker.git
cd wespeaker/runtime/binding/python
python setup.py install
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wespeakerruntime-1.0.1.tar.gz
(5.4 kB
view hashes)
Built Distribution
Close
Hashes for wespeakerruntime-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 862eafc82f8d788ea2bb30ccdbd519b82019b75725399dcf3b0db388e87600da |
|
MD5 | 021889ed919626120c89b2cf52dc5056 |
|
BLAKE2b-256 | 54468e5cb4453fe0ae17762ab0ac029e259f3a2ddbd8ce7eaaf321cba35f8acd |