Neural building blocks for speaker diarization

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language
Topic
- Scientific/Engineering

Project description

Using pyannote.audio open-source toolkit in production? Make the most of it thanks to our consulting services.

`pyannote.audio` speaker diarization toolkit

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it comes with state-of-the-art pretrained models and pipelines, that can be further finetuned to your own data for even better performance.

TL;DR

Install pyannote.audio with pip install pyannote.audio
Accept pyannote/segmentation-3.0 user conditions
Accept pyannote/speaker-diarization-3.1 user conditions
Create access token at hf.co/settings/tokens.

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# send pipeline to GPU (when available)
import torch
pipeline.to(torch.device("cuda"))

# apply pretrained pipeline
diarization = pipeline("audio.wav")

# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...

Highlights

:hugs: pretrained pipelines (and models) on :hugs: model hub
:exploding_head: state-of-the-art performance (see Benchmark)
:snake: Python-first API
:zap: multi-GPU training with pytorch-lightning

Documentation

Changelog
Frequently asked questions
Models
- Available tasks explained
- Applying a pretrained model
- Training, fine-tuning, and transfer learning
Pipelines
- Available pipelines explained
- Applying a pretrained pipeline
- Adapting a pretrained pipeline to your own data
- Training a pipeline
Contributing
- Adding a new model
- Adding a new task
- Adding a new pipeline
- Sharing pretrained models and pipelines
Blog
- 2022-12-02 > "How I reached 1st place at Ego4D 2022, 1st place at Albayzin 2022, and 6th place at VoxSRC 2022 speaker diarization challenges"
- 2022-10-23 > "One speaker segmentation model to rule them all"
- 2021-08-05 > "Streaming voice activity detection with pyannote.audio"
Videos
- Introduction to speaker diarization / JSALT 2023 summer school / 90 min
- Speaker segmentation model / Interspeech 2021 / 3 min
- First releaase of pyannote.audio / ICASSP 2020 / 8 min

Benchmark

Out of the box, pyannote.audio speaker diarization pipeline v3.1 is expected to be much better (and faster) than v2.x. Those numbers are diarization error rates (in %):

Benchmark	v2.1	v3.1	Premium
AISHELL-4	14.1	12.3	11.9
AliMeeting (channel 1)	27.4	24.5	22.5
AMI (IHM)	18.9	18.8	16.6
AMI (SDM)	27.1	22.6	20.9
AVA-AVD	66.3	50.0	39.8
CALLHOME (part 2)	31.6	28.4	22.2
DIHARD 3 (full)	26.9	21.4	17.2
Ego4D (dev.)	61.5	51.2	43.8
MSDWild	32.8	25.4	19.8
REPERE (phase2)	8.2	7.8	7.6
VoxConverse (v0.3)	11.2	11.2	9.4

Diarization error rate (in %)

Citations

If you use pyannote.audio please use the following citations:

@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

Development

The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio library.

pip install -e .[dev,testing]
pre-commit install

Test

pytest

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

4.0.4

Feb 7, 2026

4.0.3

Dec 7, 2025

4.0.2

Nov 19, 2025

4.0.1

Oct 10, 2025

4.0.0

Sep 29, 2025

3.4.0

Sep 9, 2025

3.3.2

Sep 11, 2024

3.3.1

Jun 19, 2024

3.3.0

Jun 14, 2024

3.2.0

May 8, 2024

This version

3.1.1

Dec 1, 2023

3.1.0

Nov 16, 2023

3.0.1

Sep 28, 2023

3.0.0

Sep 26, 2023

2.1.1

Oct 27, 2022

2.0.1

Jul 20, 2022

1.1.2

Jan 28, 2021

1.1.1

Nov 25, 2020

1.1

Nov 8, 2020

0.0.1

Jul 20, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyannote.audio-3.1.1.tar.gz (14.8 MB view details)

Uploaded Dec 1, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyannote.audio-3.1.1-py2.py3-none-any.whl (208.7 kB view details)

Uploaded Dec 1, 2023 Python 2Python 3

File details

Details for the file pyannote.audio-3.1.1.tar.gz.

File metadata

Download URL: pyannote.audio-3.1.1.tar.gz
Upload date: Dec 1, 2023
Size: 14.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for pyannote.audio-3.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b6562d46b5d5d616c1887cafd5aab3f7d19ecf6b3312fc6c4f72f37332fcd408`
MD5	`04f40b24dc246c9f85eae76f3c2cbe99`
BLAKE2b-256	`cf8dc6920c1c1fe439bee72c591297c69d32fe27218a96fd708e4d2274d5462c`

See more details on using hashes here.

File details

Details for the file pyannote.audio-3.1.1-py2.py3-none-any.whl.

File metadata

Download URL: pyannote.audio-3.1.1-py2.py3-none-any.whl
Upload date: Dec 1, 2023
Size: 208.7 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for pyannote.audio-3.1.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`52f86cba990c0afaffecd7ecb84aff79709a51923462ae8a44d6ac4d2010836a`
MD5	`0cc7f996be23afc687ee580fc0b5bdc4`
BLAKE2b-256	`f511611c32f7b7894ba588ade502525d0130f3e731d15f925e9f2a1ae66c8680`

See more details on using hashes here.

pyannote-audio 3.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

`pyannote.audio` speaker diarization toolkit

TL;DR

Highlights

Documentation

Benchmark

Citations

Development

Test

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

pyannote-audio 3.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pyannote.audio speaker diarization toolkit

TL;DR

Highlights

Documentation

Benchmark

Citations

Development

Test

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`pyannote.audio` speaker diarization toolkit