Skip to main content

Drop-in, OpenVINO-accelerated speaker diarization for pyannote.audio.

Project description

pyannote-openvino

OpenVINO acceleration for the pyannote.audio speaker diarization 3.1 pipeline. This project keeps the familiar pyannote API while running the heavy segmentation and embedding models via Intel-compatible OpenVINO IR, so the pipeline runs on CPU and Intel GPUs without relying on PyTorch FFT patches.

Installation

  1. Create or activate the provided virtual environment (.venv).
  2. Install the runtime dependencies:
    python -m pip install -e .[stt]
    
    The [stt] extra pulls the openai-whisper model that the docs/transcribe_v4.py helper uses to turn diarization segments into per-speaker text. If you only need the OV pipeline, install the base requirements listed in requirements.txt.
  3. Ensure you have an FFmpeg binary on PATH (the repo contains shared libraries under ffmpeg/bin for convenience).

Exporting the reference models to ONNX

Export scripts live under scripts/phase2/:

  • export_segmentation.py exports the SincNet+transformer segmentation model with dynamic frame lengths.
  • export_embedding.py wraps the ResNet embedding head so it consumes pre-computed mel filter banks instead of running FFT/RFFT inside the ONNX graph.

Run both scripts before converting to IR:

python scripts/phase2/export_segmentation.py --duration 2.0 --output models/onnx/segmentation.onnx
python scripts/phase2/export_embedding.py --duration 2.0 --frames 128 --output models/onnx/embedding.onnx

You can also use the optimum-cli shortcuts shown in this repo:

optimum-cli export openvino --model models/onnx/segmentation.onnx models/ov/segmentation
optimum-cli export openvino --model models/onnx/embedding.onnx models/ov/embedding

Converting ONNX to OpenVINO IR

convert_to_ov.py wraps the OpenVINO Model Optimizer (MO) to turn ONNX files into .xml/.bin IR blobs stored under models/ov/. By default it keeps FP32 weights but accepts --weight-format fp16 for iGPU workloads.

Validation is available via scripts/phase3/validate_ov.py, which loads the IR models with openvino.runtime.Core, runs dummy inputs, and prints the output shapes.

Running the OpenVINO diarization pipeline

Use pyannote_openvino.OVSpeakerDiarization as a drop-in replacement for pyannote.audio.Pipeline.from_pretrained("pyannote/speaker-diarization-3.1"). The helper accepts segmentation_xml, embedding_xml, and a device string such as CPU, GPU, or GPU.0:

from pyannote_openvino import OVSpeakerDiarization
pipeline = OVSpeakerDiarization.from_pretrained("models/ov", device="GPU")
diart = pipeline("samples/Stirling Lennon Clips_mixdown.wav")
print(diart)

By default the segmentation/embedding classes mirror the pyannote interface (num_frames, receptive_field_size, etc.), so the existing clustering code and pipeline utilities continue to work.

Speaker-aware transcription helper (transcribe_v4)

The repo ships a single CLI under docs/transcribe_v4.py that accelerates both diarization and transcription on Intel iGPU:

  1. Run the OpenVINO diarization pipeline.
  2. Load the same WAV file into memory and crop each speaker turn.
  3. Feed each crop to openai-whisper (default tiny) to produce text for the speaker/segs.

Example usage:

python docs/transcribe_v4.py \
  --audio samples/Stirling\ Lennon\ Clips_mixdown.wav \
  --device GPU \
  --whisper-ov whisper-large-v3-ov \
  --output-txt artifacts/transcribe_v4.txt

The CLI prints timestamps, speaker labels, and the recognized text, and also writes a TSV-style summary to the --output path for later reference.

Testing and validation

  • python scripts/phase1/audit_models.py records environment versions and shapes.
  • python scripts/phase2/validate_onnx.py compares the ONNX exports against the original torch models.
  • scripts/phase3/validate_ov.py loads the IR models and runs dummy inference.
  • docs/transcribe_v4.py serves as the end-to-end Intel GPU smoke test (diarization + STT) on any WAV file.

Directory layout

  • models/onnx/ – ONNX exports produced by Phase 2.
  • models/ov/ – OpenVINO IR files generated by Phase 3.
  • scripts/phase{1..3}/ – export, conversion, and validation helpers.
  • pyannote_openvino/ – the runtime library that wires OVSegmentationModel, OVEmbeddingModel, and OVSpeakerDiarization into pyannote’s APIs.
  • docs/transcribe_v4.py – per-speaker transcription CLI.

Troubleshooting

  • If torchaudio fails to read your audio, install FFmpeg and point PATH at ffmpeg/bin (a copy lives in this repo for reference).
  • Whisper downloads models the first time it runs; choose a small or tiny model for fast iteration and pin --stt-device to cpu if your GPU is busy.

Tests

  • Install the test extra (and the transcription tooling) before running the suite:
    python -m pip install -e .[stt,test]
    
  • Run the pytest suite to make sure the OV pipeline returns a valid annotation:
    python -m pytest
    
  • The same command runs in CI (GitHub Actions, GitLab CI) and is fast enough to execute on every push/PR.

CI & Release Pipelines

  • GitHub Actions:
    • ci.yml runs on push/PR, installs the [stt,test] extras, and executes python -m pytest.
    • release.yml runs on refs/tags/v*, reuses the same extras plus the build tool, reruns the tests, builds a wheel/tarball via python -m build, and publishes the artifacts to a GitHub release using softprops/action-gh-release.
  • GitLab CI:
    • .gitlab-ci.yml defines test and release stages. Both install the [stt,test,build] extras, the test job runs python -m pytest, and the release job (tag-only) runs python -m build and exposes dist/ as an artifact for later download.
    • Commit tags matching v* (for example v0.1.0) will trigger the release stage and produce the distributables.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyannote_openvino-0.1.1.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyannote_openvino-0.1.1-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file pyannote_openvino-0.1.1.tar.gz.

File metadata

  • Download URL: pyannote_openvino-0.1.1.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for pyannote_openvino-0.1.1.tar.gz
Algorithm Hash digest
SHA256 54f0bd46c284a75b7abfd90362e8962b5c42d3f9e6134b5342600d200e185bad
MD5 bd932415ca9719aa9d8af06df421cc69
BLAKE2b-256 bce9705ad4d433f64ff6d40e55d4f97be613670fb0013962806dae0bfd10ac82

See more details on using hashes here.

File details

Details for the file pyannote_openvino-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pyannote_openvino-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a445ed47172d26e7bc7fc6a87a722b251cc5af09c1a466b5699d64a53f9d5646
MD5 d101c674bed592eb2cbf86074f7a261b
BLAKE2b-256 269fa44e44be8be36872585badd8382c8b89d6cc047dbc67476006c6facd508c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page