Pre-trained model for extracting the x-vector (speaker representation vector)
Project description
x-vector extractor for Japanese speech
This repository provides a pre-trained model for extracting the x-vector (speaker representation vector). The model is trained using JTubeSpeech corpus, a Japanese speech corpus collected from YouTube.
このリポジトリは,x-vector (話者表現ベクトル) を抽出するための学習済みモデルを提供します.このモデルは,JTubeSpeechコーパスと呼ばれる,YouTubeから収集した日本語音声から学習されています.
Training configures / 学習時の設定
- The number of speakers: 1,233
- Sampling frequency: 16,000Hz
- Speaker recognition accuracy: 91% (test data)
- Feature: 24-dimensional MFCC
- Dimensionality of x-vector: 512
- Other configurations: followed the ASV recipe for VoxCeleb in Kaldi.
- In the opensourced model, model parameters of recognition layers following to the x-vector layer were randomized to protect data privacy.
Installation
pip install xvector-jtubespeech
Usage / 使い方
import numpy as np
from scipy.io import wavfile
import torch
from torchaudio.compliance import kaldi
from xvector_jtubespeech import XVector
def extract_xvector(
model, # xvector model
wav # 16kHz mono
):
# extract mfcc
wav = torch.from_numpy(wav.astype(np.float32)).unsqueeze(0)
mfcc = kaldi.mfcc(wav, num_ceps=24, num_mel_bins=24) # [1, T, 24]
mfcc = mfcc.unsqueeze(0)
# extract xvector
xvector = model.vectorize(mfcc) # (1, 512)
xvector = xvector.to("cpu").detach().numpy().copy()[0]
return xvector
_, wav = wavfile.read("sample.wav") # 16kHz mono
model = XVector("xvector.pth")
xvector = extract_xvector(model, wav) # (512, )
Contributors / 貢献者
- Takaki Hamada / 濱田 誉輝 (The University of Tokyo / 東京大学)
- Shinnosuke Takamichi / 高道 慎之介 (The University of Tokyo / 東京大学)
License / ライセンス
MIT
Others / その他
- The audio sample
sample.wav
was copied from PJS corpus.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for xvector_jtubespeech-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b76fd5b701056b21658741ba71e92abc47fe4416431a7e0e7ffddfcfa32f364 |
|
MD5 | b6813e61d698a75556ce4e541290ad4a |
|
BLAKE2b-256 | a704b904a8430fe75c39946ab47a604c0bb7f5f96be86d24d58beff1d7814a68 |
Close
Hashes for xvector_jtubespeech-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3cf90ffe4e434995e1a8000f6a2c10ad6f67b748887a2736d9bf018d62ff853 |
|
MD5 | b8f70eebbff27e62ff578062ce0a31c7 |
|
BLAKE2b-256 | a0d1a49388abf8f1f587f49fef463e6b1bab444d7c0b76752ad25d2c3aded3d8 |