Drop-in, OpenVINO-accelerated speaker diarization for pyannote.audio.

These details have not been verified by PyPI

Project links

Project description

pyannote-openvino

OpenVINO acceleration for the pyannote.audio speaker diarization 3.1 pipeline. This project keeps the familiar pyannote API while running the heavy segmentation and embedding models via Intel-compatible OpenVINO IR, so the pipeline runs on CPU and Intel GPUs without relying on PyTorch FFT patches.

Installation

Create or activate the provided virtual environment (.venv).
Install the runtime dependencies:
```
python -m pip install -e .[stt]
```
The [stt] extra pulls the openai-whisper model that the docs/transcribe_v4.py helper uses to turn diarization segments into per-speaker text. If you only need the OV pipeline, install the base requirements listed in requirements.txt.
Ensure you have an FFmpeg binary on PATH (the repo contains shared libraries under ffmpeg/bin for convenience).

Exporting the reference models to ONNX

Export scripts live under scripts/phase2/:

export_segmentation.py exports the SincNet+transformer segmentation model with dynamic frame lengths.
export_embedding.py wraps the ResNet embedding head so it consumes pre-computed mel filter banks instead of running FFT/RFFT inside the ONNX graph.

Run both scripts before converting to IR:

python scripts/phase2/export_segmentation.py --duration 2.0 --output models/onnx/segmentation.onnx
python scripts/phase2/export_embedding.py --duration 2.0 --frames 128 --output models/onnx/embedding.onnx

You can also use the optimum-cli shortcuts shown in this repo:

optimum-cli export openvino --model models/onnx/segmentation.onnx models/ov/segmentation
optimum-cli export openvino --model models/onnx/embedding.onnx models/ov/embedding

Converting ONNX to OpenVINO IR

convert_to_ov.py wraps the OpenVINO Model Optimizer (MO) to turn ONNX files into .xml/.bin IR blobs stored under models/ov/. By default it keeps FP32 weights but accepts --weight-format fp16 for iGPU workloads.

Validation is available via scripts/phase3/validate_ov.py, which loads the IR models with openvino.runtime.Core, runs dummy inputs, and prints the output shapes.

Running the OpenVINO diarization pipeline

Use pyannote_openvino.OVSpeakerDiarization as a drop-in replacement for pyannote.audio.Pipeline.from_pretrained("pyannote/speaker-diarization-3.1"). The helper accepts segmentation_xml, embedding_xml, and a device string such as CPU, GPU, or GPU.0:

from pyannote_openvino import OVSpeakerDiarization
pipeline = OVSpeakerDiarization.from_pretrained("models/ov", device="GPU")
diart = pipeline("samples/Stirling Lennon Clips_mixdown.wav")
print(diart)

By default the segmentation/embedding classes mirror the pyannote interface (num_frames, receptive_field_size, etc.), so the existing clustering code and pipeline utilities continue to work.

Speaker-aware transcription helper (`transcribe_v4`)

The repo ships a single CLI under docs/transcribe_v4.py that accelerates both diarization and transcription on Intel iGPU:

Run the OpenVINO diarization pipeline.
Load the same WAV file into memory and crop each speaker turn.
Feed each crop to openai-whisper (default tiny) to produce text for the speaker/segs.

Example usage:

python docs/transcribe_v4.py \
  --audio samples/Stirling\ Lennon\ Clips_mixdown.wav \
  --device GPU \
  --whisper-ov whisper-large-v3-ov \
  --output-txt artifacts/transcribe_v4.txt

The CLI prints timestamps, speaker labels, and the recognized text, and also writes a TSV-style summary to the --output path for later reference.

Testing and validation

python scripts/phase1/audit_models.py records environment versions and shapes.
python scripts/phase2/validate_onnx.py compares the ONNX exports against the original torch models.
scripts/phase3/validate_ov.py loads the IR models and runs dummy inference.
docs/transcribe_v4.py serves as the end-to-end Intel GPU smoke test (diarization + STT) on any WAV file.

Directory layout

models/onnx/ – ONNX exports produced by Phase 2.
models/ov/ – OpenVINO IR files generated by Phase 3.
scripts/phase{1..3}/ – export, conversion, and validation helpers.
pyannote_openvino/ – the runtime library that wires OVSegmentationModel, OVEmbeddingModel, and OVSpeakerDiarization into pyannote’s APIs.
docs/transcribe_v4.py – per-speaker transcription CLI.

Troubleshooting

If torchaudio fails to read your audio, install FFmpeg and point PATH at ffmpeg/bin (a copy lives in this repo for reference).
Whisper downloads models the first time it runs; choose a small or tiny model for fast iteration and pin --stt-device to cpu if your GPU is busy.

Tests

Install the test extra (and the transcription tooling) before running the suite:
```
python -m pip install -e .[stt,test]
```
Run the pytest suite to make sure the OV pipeline returns a valid annotation:
```
python -m pytest
```
The same command runs in CI (GitHub Actions, GitLab CI) and is fast enough to execute on every push/PR.

CI & Release Pipelines

GitHub Actions:
- ci.yml runs on push/PR, installs the [stt,test] extras, and executes python -m pytest.
- release.yml runs on refs/tags/v*, reuses the same extras plus the build tool, reruns the tests, builds a wheel/tarball via python -m build, and publishes the artifacts to a GitHub release using softprops/action-gh-release.
GitLab CI:
- .gitlab-ci.yml defines test and release stages. Both install the [stt,test,build] extras, the test job runs python -m pytest, and the release job (tag-only) runs python -m build and exposes dist/ as an artifact for later download.
- Commit tags matching v* (for example v0.1.0) will trigger the release stage and produce the distributables.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Mar 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyannote_openvino-0.1.1.tar.gz (10.2 kB view details)

Uploaded Mar 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyannote_openvino-0.1.1-py3-none-any.whl (9.3 kB view details)

Uploaded Mar 2, 2026 Python 3

File details

Details for the file pyannote_openvino-0.1.1.tar.gz.

File metadata

Download URL: pyannote_openvino-0.1.1.tar.gz
Upload date: Mar 2, 2026
Size: 10.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for pyannote_openvino-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`54f0bd46c284a75b7abfd90362e8962b5c42d3f9e6134b5342600d200e185bad`
MD5	`bd932415ca9719aa9d8af06df421cc69`
BLAKE2b-256	`bce9705ad4d433f64ff6d40e55d4f97be613670fb0013962806dae0bfd10ac82`

See more details on using hashes here.

File details

Details for the file pyannote_openvino-0.1.1-py3-none-any.whl.

File metadata

Download URL: pyannote_openvino-0.1.1-py3-none-any.whl
Upload date: Mar 2, 2026
Size: 9.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for pyannote_openvino-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a445ed47172d26e7bc7fc6a87a722b251cc5af09c1a466b5699d64a53f9d5646`
MD5	`d101c674bed592eb2cbf86074f7a261b`
BLAKE2b-256	`269fa44e44be8be36872585badd8382c8b89d6cc047dbc67476006c6facd508c`

See more details on using hashes here.

pyannote-openvino 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pyannote-openvino

Installation

Exporting the reference models to ONNX

Converting ONNX to OpenVINO IR

Running the OpenVINO diarization pipeline

Speaker-aware transcription helper (`transcribe_v4`)

Testing and validation

Directory layout

Troubleshooting

Tests

CI & Release Pipelines

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

pyannote-openvino 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pyannote-openvino

Installation

Exporting the reference models to ONNX

Converting ONNX to OpenVINO IR

Running the OpenVINO diarization pipeline

Speaker-aware transcription helper (transcribe_v4)

Testing and validation

Directory layout

Troubleshooting

Tests

CI & Release Pipelines

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Speaker-aware transcription helper (`transcribe_v4`)