Skip to main content

Neural building blocks for speaker diarization

Project description

Using pyannote.audio open-source toolkit in production?
Make the most of it thanks to our consulting services.

pyannote.audio speaker diarization toolkit

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it comes with state-of-the-art pretrained models and pipelines, that can be further finetuned to your own data for even better performance.

TL;DR

  1. Install pyannote.audio 3.0 with pip install pyannote.audio
  2. Accept pyannote/segmentation-3.0 user conditions
  3. Accept pyannote/speaker-diarization-3.0 user conditions
  4. Create access token at hf.co/settings/tokens.
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.0",
    use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# send pipeline to GPU (when available)
import torch
pipeline.to(torch.device("cuda"))

# apply pretrained pipeline
diarization = pipeline("audio.wav")

# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...

Highlights

Documentation

Benchmark

Out of the box, pyannote.audio speaker diarization pipeline v3.0 is expected to be much better (and faster) than v2.x.
Those numbers are diarization error rates (in %):

Dataset \ Version v1.1 v2.0 v2.1 v3.0 Premium
AISHELL-4 - 14.6 14.1 12.3 12.3
AliMeeting (channel 1) - - 27.4 24.3 19.4
AMI (IHM) 29.7 18.2 18.9 19.0 16.7
AMI (SDM) - 29.0 27.1 22.2 20.1
AVA-AVD - - - 49.1 42.7
DIHARD 3 (full) 29.2 21.0 26.9 21.7 17.0
MSDWild - - - 24.6 20.4
REPERE (phase2) - 12.6 8.2 7.8 7.8
VoxConverse (v0.3) 21.5 12.6 11.2 11.3 9.5

Citations

If you use pyannote.audio please use the following citations:

@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

Development

The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio library.

pip install -e .[dev,testing]
pre-commit install

Test

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyannote.audio-3.0.0.tar.gz (15.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyannote.audio-3.0.0-py2.py3-none-any.whl (200.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pyannote.audio-3.0.0.tar.gz.

File metadata

  • Download URL: pyannote.audio-3.0.0.tar.gz
  • Upload date:
  • Size: 15.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for pyannote.audio-3.0.0.tar.gz
Algorithm Hash digest
SHA256 fc50ac5f357789f4817175aed0b136264fe2b1791182e3737dba1f2825eb4f8c
MD5 60d17b14383074898a3e4a786fdf1dc6
BLAKE2b-256 813a06a92d2d180a153cd50db5366c99ed28f28895e71ef91a16280812569ab0

See more details on using hashes here.

File details

Details for the file pyannote.audio-3.0.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for pyannote.audio-3.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 d70559dda211d90c20a93ed524de92770dc8ef959d7479cad964833b0ec76232
MD5 703816df5970eca0c71da60f265062fe
BLAKE2b-256 b08e2b74f4fdfc0666f0e3acbe526c5ab0323b098f6edbbebcc9fb21bf806c60

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page