Lingvo utils for Google SVL team
Project description
sidlingvo
: Lingvo-based libraries for speaker and language recognition
Overview
Here we open source some of the Lingvo-based libraries used in our publications.
Disclaimer
This is NOT an official Google product.
Feature frontend and TFLite inference
For the feature frontend and TFLite inference, see the API in
siglingvo/fe_utils.py
.
For pretrained speaker encoder models, the inference API is in sidlingvo/wav_to_dvector.py
.
For pretrained language identifcation models, the inference API is in sidlingvo/wav_to_lang.py
.
GE2E and GE2E-XS losses
GE2E and GE2E-XS losses are implemented in sidlingvo/loss_layers.py
.
GE2E was proposed in this paper:
GE2E-XS was proposed in this paper:
Attentive temporal pooling
Attentive temporal pooling is implemented in sidlingvo/cumulative_statistics_layer.py
.
It is used by these papers:
- Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech
- Parameter-Free Attentive Scoring for Speaker Verification
Attentive scoring
Attentive scoring is implemented in sidlingvo/attentive_scoring_layer.py
.
It is proposed in this paper:
Citations
Our papers are cited as:
@inproceedings{wan2018generalized,
title={Generalized end-to-end loss for speaker verification},
author={Wan, Li and Wang, Quan and Papir, Alan and Moreno, Ignacio Lopez},
booktitle={International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={4879--4883},
year={2018},
organization={IEEE}
}
@inproceedings{pelecanos2021drvectors,
title={{Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition}},
author={Jason Pelecanos and Quan Wang and Ignacio Lopez Moreno},
year={2021},
booktitle={Proc. Interspeech},
pages={4603--4607},
doi={10.21437/Interspeech.2021-641}
}
@inproceedings{pelecanos2022parameter,
title={Parameter-Free Attentive Scoring for Speaker Verification},
author={Jason Pelecanos and Quan Wang and Yiling Huang and Ignacio Lopez Moreno},
booktitle={Odyssey: The Speaker and Language Recognition Workshop},
year={2022}
}
@inproceedings{wang2022attentive,
title={Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech},
author={Quan Wang and Yang Yu and Jason Pelecanos and Yiling Huang and Ignacio Lopez Moreno},
booktitle={Odyssey: The Speaker and Language Recognition Workshop},
year={2022}
}
@inproceedings{chojnacka2021speakerstew,
title={{SpeakerStew: Scaling to many languages with a triaged multilingual text-dependent and text-independent speaker verification system}},
author={Chojnacka, Roza and Pelecanos, Jason and Wang, Quan and Moreno, Ignacio Lopez},
booktitle={Prod. Interspeech},
pages={1064--1068},
year={2021},
doi={10.21437/Interspeech.2021-646},
issn={2958-1796},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sidlingvo-0.0.13.tar.gz
.
File metadata
- Download URL: sidlingvo-0.0.13.tar.gz
- Upload date:
- Size: 30.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 431a384458f1b740ed53405d31f0611329da205f27acb9b9cfef8f58d47325a5 |
|
MD5 | cc80a09ba69496532641016b6e73c8b5 |
|
BLAKE2b-256 | ef668542382543332282448dc6b442e1bbda4d92f61437fc36c9ccc771331d1c |
File details
Details for the file sidlingvo-0.0.13-py3-none-any.whl
.
File metadata
- Download URL: sidlingvo-0.0.13-py3-none-any.whl
- Upload date:
- Size: 34.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c2e0ee5393f25bc9021ede4fd485a50e802717546df1f1c332d482465fe02959 |
|
MD5 | ffd9f73b1216cdc107b519b9da4fcbea |
|
BLAKE2b-256 | 888232a0d69478a28f90dfb7d5b02166351df98d4596a52ccc6c059c5c0dc073 |