Skip to main content

Neural building blocks for speaker diarization

Project description

Using pyannote.audio open-source toolkit in production?
Consider switching to pyannoteAI for better and faster options.

pyannote.audio speaker diarization toolkit

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it comes with state-of-the-art pretrained models and pipelines, that can be further finetuned to your own data for even better performance.

TL;DR

  1. Install pyannote.audio with pip install pyannote.audio
  2. Accept pyannote/segmentation-3.0 user conditions
  3. Accept pyannote/speaker-diarization-3.1 user conditions
  4. Create access token at hf.co/settings/tokens.
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# send pipeline to GPU (when available)
import torch
pipeline.to(torch.device("cuda"))

# apply pretrained pipeline
diarization = pipeline("audio.wav")

# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...

Highlights

Documentation

Benchmark

Out of the box, pyannote.audio speaker diarization pipeline v3.1 is expected to be much better (and faster) than v2.x. Those numbers are diarization error rates (in %):

Benchmark v2.1 v3.1 pyannoteAI
AISHELL-4 14.1 12.2 11.9
AliMeeting (channel 1) 27.4 24.4 22.5
AMI (IHM) 18.9 18.8 16.6
AMI (SDM) 27.1 22.4 20.9
AVA-AVD 66.3 50.0 39.8
CALLHOME (part 2) 31.6 28.4 22.2
DIHARD 3 (full) 26.9 21.7 17.2
Earnings21 17.0 9.4 9.0
Ego4D (dev.) 61.5 51.2 43.8
MSDWild 32.8 25.3 19.8
RAMC 22.5 22.2 18.4
REPERE (phase2) 8.2 7.8 7.6
VoxConverse (v0.3) 11.2 11.3 9.4

Diarization error rate (in %)

Citations

If you use pyannote.audio please use the following citations:

@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

Development

The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio library.

pip install -e .[dev,testing]
pre-commit install

Test

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyannote_audio-3.3.0.tar.gz (13.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyannote.audio-3.3.0-py2.py3-none-any.whl (898.7 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pyannote_audio-3.3.0.tar.gz.

File metadata

  • Download URL: pyannote_audio-3.3.0.tar.gz
  • Upload date:
  • Size: 13.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for pyannote_audio-3.3.0.tar.gz
Algorithm Hash digest
SHA256 95b7d009ff246f66b0e40ff4fff10db6063dea2ac7fe8dfff23fd9d798f29c15
MD5 d76cd5a6e3a6aad2e0ff6625d815c26a
BLAKE2b-256 b6a0c1803a913af6c534aeab73d6e9f224d945061989ac08b1cb5c23a97afe1a

See more details on using hashes here.

File details

Details for the file pyannote.audio-3.3.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for pyannote.audio-3.3.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 3012f4ba83753f27a7922f9d5ee4b31b73a2c4c973be6fc6911f51746d03dbe2
MD5 4b9bfaa0448fa8d99f57244797ef8b3b
BLAKE2b-256 831882aacf857c1cb9ce6da526b74fd06f347851a4c2f8a4c963a081a9259aa2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page