Neural building blocks for speaker diarization
Project description
Neural speaker diarization with pyannote-audio
pyannote.audio
is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines:
pyannote.audio
also comes with pretrained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding:
Installation
pyannote.audio
only supports Python 3.7 (or later) on Linux and macOS. It might work on Windows but there is no garantee that it does, nor any plan to add official support for Windows.
The instructions below assume that pytorch
has been installed using the instructions from https://pytorch.org.
$ pip install pyannote.audio==1.1
Documentation and tutorials
- Use pretrained models and pipelines
- Prepare your own data
- Train models on your own data
- Tune pipelines on your own data
Until a proper documentation is released, note that part of the API is described in this tutorial.
Citation
If you use pyannote.audio
please use the following citation
@inproceedings{Bredin2020, Title = {{pyannote.audio: neural building blocks for speaker diarization}}, Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe}, Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing}, Address = {Barcelona, Spain}, Month = {May}, Year = {2020}, }
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size pyannote.audio-1.1.1-py3-none-any.whl (230.9 kB) | File type Wheel | Python version py3 | Upload date | Hashes View |
Filename, size pyannote.audio-1.1.1.tar.gz (137.8 kB) | File type Source | Python version None | Upload date | Hashes View |
Hashes for pyannote.audio-1.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2d0132f6722f13bb7b96624cfbe04952362b90ec2eaf6a11fa3bc8c5eb5e690 |
|
MD5 | 612857831440cb46741e2f4528d47901 |
|
BLAKE2-256 | b89e3539c9d74a477eba4051aadb2a686b99249358f9d8f513780acab2d2a54e |