Drop-in, OpenVINO-accelerated speaker diarization for pyannote.audio.
Project description
pyannote-openvino
OpenVINO acceleration for the pyannote.audio speaker diarization 3.1 pipeline. This project keeps the familiar pyannote API while running the heavy segmentation and embedding models via Intel-compatible OpenVINO IR, so the pipeline runs on CPU and Intel GPUs without relying on PyTorch FFT patches.
Installation
- Create or activate the provided virtual environment (
.venv). - Install the runtime dependencies:
python -m pip install -e .[stt]
The[stt]extra pulls theopenai-whispermodel that thedocs/transcribe_v4.pyhelper uses to turn diarization segments into per-speaker text. If you only need the OV pipeline, install the base requirements listed inrequirements.txt. - Ensure you have an FFmpeg binary on
PATH(the repo contains shared libraries underffmpeg/binfor convenience).
Exporting the reference models to ONNX
Export scripts live under scripts/phase2/:
export_segmentation.pyexports the SincNet+transformer segmentation model with dynamic frame lengths.export_embedding.pywraps the ResNet embedding head so it consumes pre-computed mel filter banks instead of running FFT/RFFT inside the ONNX graph.
Run both scripts before converting to IR:
python scripts/phase2/export_segmentation.py --duration 2.0 --output models/onnx/segmentation.onnx
python scripts/phase2/export_embedding.py --duration 2.0 --frames 128 --output models/onnx/embedding.onnx
You can also use the optimum-cli shortcuts shown in this repo:
optimum-cli export openvino --model models/onnx/segmentation.onnx models/ov/segmentation
optimum-cli export openvino --model models/onnx/embedding.onnx models/ov/embedding
Converting ONNX to OpenVINO IR
convert_to_ov.py wraps the OpenVINO Model Optimizer (MO) to turn ONNX files
into .xml/.bin IR blobs stored under models/ov/. By default it keeps FP32
weights but accepts --weight-format fp16 for iGPU workloads.
Validation is available via scripts/phase3/validate_ov.py, which loads the IR
models with openvino.runtime.Core, runs dummy inputs, and prints the output
shapes.
Running the OpenVINO diarization pipeline
Use pyannote_openvino.OVSpeakerDiarization as a drop-in replacement for
pyannote.audio.Pipeline.from_pretrained("pyannote/speaker-diarization-3.1").
The helper accepts segmentation_xml, embedding_xml, and a device string such as
CPU, GPU, or GPU.0:
from pyannote_openvino import OVSpeakerDiarization
pipeline = OVSpeakerDiarization.from_pretrained("models/ov", device="GPU")
diart = pipeline("samples/Stirling Lennon Clips_mixdown.wav")
print(diart)
By default the segmentation/embedding classes mirror the pyannote interface
(num_frames, receptive_field_size, etc.), so the existing clustering code and
pipeline utilities continue to work.
Speaker-aware transcription helper (transcribe_v4)
The repo ships a single CLI under docs/transcribe_v4.py that accelerates both diarization and transcription on Intel iGPU:
- Run the OpenVINO diarization pipeline.
- Load the same WAV file into memory and crop each speaker turn.
- Feed each crop to
openai-whisper(defaulttiny) to produce text for the speaker/segs.
Example usage:
python docs/transcribe_v4.py \
--audio samples/Stirling\ Lennon\ Clips_mixdown.wav \
--device GPU \
--whisper-ov whisper-large-v3-ov \
--output-txt artifacts/transcribe_v4.txt
The CLI prints timestamps, speaker labels, and the recognized text, and also
writes a TSV-style summary to the --output path for later reference.
Testing and validation
python scripts/phase1/audit_models.pyrecords environment versions and shapes.python scripts/phase2/validate_onnx.pycompares the ONNX exports against the original torch models.scripts/phase3/validate_ov.pyloads the IR models and runs dummy inference.docs/transcribe_v4.pyserves as the end-to-end Intel GPU smoke test (diarization + STT) on any WAV file.
Directory layout
models/onnx/– ONNX exports produced by Phase 2.models/ov/– OpenVINO IR files generated by Phase 3.scripts/phase{1..3}/– export, conversion, and validation helpers.pyannote_openvino/– the runtime library that wiresOVSegmentationModel,OVEmbeddingModel, andOVSpeakerDiarizationinto pyannote’s APIs.docs/transcribe_v4.py– per-speaker transcription CLI.
Troubleshooting
- If
torchaudiofails to read your audio, install FFmpeg and pointPATHatffmpeg/bin(a copy lives in this repo for reference). - Whisper downloads models the first time it runs; choose a small or tiny model
for fast iteration and pin
--stt-devicetocpuif your GPU is busy.
Tests
- Install the
testextra (and the transcription tooling) before running the suite:python -m pip install -e .[stt,test]
- Run the pytest suite to make sure the OV pipeline returns a valid annotation:
python -m pytest
- The same command runs in CI (GitHub Actions, GitLab CI) and is fast enough to execute on every push/PR.
CI & Release Pipelines
- GitHub Actions:
ci.ymlruns on push/PR, installs the[stt,test]extras, and executespython -m pytest.release.ymlruns onrefs/tags/v*, reuses the same extras plus thebuildtool, reruns the tests, builds a wheel/tarball viapython -m build, and publishes the artifacts to a GitHub release usingsoftprops/action-gh-release.
- GitLab CI:
.gitlab-ci.ymldefinestestandreleasestages. Both install the[stt,test,build]extras, thetestjob runspython -m pytest, and thereleasejob (tag-only) runspython -m buildand exposesdist/as an artifact for later download.- Commit tags matching
v*(for examplev0.1.0) will trigger the release stage and produce the distributables.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyannote_openvino-0.1.1.tar.gz.
File metadata
- Download URL: pyannote_openvino-0.1.1.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54f0bd46c284a75b7abfd90362e8962b5c42d3f9e6134b5342600d200e185bad
|
|
| MD5 |
bd932415ca9719aa9d8af06df421cc69
|
|
| BLAKE2b-256 |
bce9705ad4d433f64ff6d40e55d4f97be613670fb0013962806dae0bfd10ac82
|
File details
Details for the file pyannote_openvino-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pyannote_openvino-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a445ed47172d26e7bc7fc6a87a722b251cc5af09c1a466b5699d64a53f9d5646
|
|
| MD5 |
d101c674bed592eb2cbf86074f7a261b
|
|
| BLAKE2b-256 |
269fa44e44be8be36872585badd8382c8b89d6cc047dbc67476006c6facd508c
|