Skip to main content

Speaker diarization — who spoke when. Rust + ONNX, no Python runtime overhead.

Project description

polyvoice

CI Crates.io PyPI Docs.rs License: MIT

Speaker diarization for Rust — who spoke when, without Python.

Production-ready speaker diarization that runs on CPU, fits in 30 MB, and outperforms AHC clustering with automatic K-means speaker count detection.

Speaker_0: 0.0s - 12.3s
Speaker_1: 14.1s - 28.7s
Speaker_0: 31.2s - 45.0s

At a glance

polyvoice pyannote 3.1 whisperX
VoxConverse DER 14.12% ~12% ~15%
Model size ~30 MB ~100 MB ~1 GB
Runtime CPU only GPU recommended GPU required
Dependencies Zero (ONNX) PyTorch + ONNX PyTorch + faster-whisper
Languages Rust / Python / C / CLI Python only Python only
Streaming Yes No No

~80% of pyannote's accuracy at 10× less RAM and no GPU.


Install

# Rust
cargo add polyvoice --features onnx

# Python
pip install polyvoice

# CLI
cargo install polyvoice --features cli

Quick start — Rust

use polyvoice::models::ModelRegistry;
use polyvoice::pipeline_v2::hybrid::HybridPipeline;
use polyvoice::segmentation::PowersetSegmenter;
use polyvoice::embedder::ResNet34Adapter;
use polyvoice::clusterer::KMeansClusterer;
use polyvoice::types::SampleRate;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Models auto-download on first run
    let registry = ModelRegistry::default()?;
    let models = registry.ensure_for_profile(polyvoice::types::Profile::Balanced)?;

    let segmenter = PowersetSegmenter::new(&models.segmenter_path)?;
    let embedder = ResNet34Adapter::new(&models.embedder_path, 4)?;
    let clusterer = KMeansClusterer::new(20); // auto-k via silhouette

    let pipeline = HybridPipeline::new(
        Box::new(segmenter),
        Box::new(embedder),
        Box::new(clusterer),
    );

    let (samples, _sr) = polyvoice::wav::read_wav("meeting.wav")?;
    let result = pipeline.run(&samples, SampleRate::new(16000).unwrap())?;

    for turn in &result.turns {
        println!("{}: {:.1}s - {:.1}s", turn.speaker, turn.time.start, turn.time.end);
    }
    Ok(())
}

Quick start — Python

import polyvoice

pipeline = polyvoice.Pipeline.balanced("models/")
result = pipeline.run(samples, sample_rate=16000)

for turn in result["turns"]:
    print(f"{turn['speaker']}: {turn['start']:.1f}s - {turn['end']:.1f}s")

Quick start — CLI

# Download models once
polyvoice download-models --profile balanced

# Diarize
polyvoice diarize meeting.wav --output meeting.rttm

Benchmarks

Pipeline Dataset Files DER Notes
Hybrid + K-means VoxConverse-test 232 14.12% Auto-k, no threshold tuning
Hybrid + AHC VoxConverse-test 232 18.77% Manual threshold 0.40
Legacy (Silero + AHC) VoxConverse-test 232 ~14% Baseline pipeline
Hybrid + K-means VoxConverse-test 10 13.48% Subset
Hybrid + AHC VoxConverse-test 10 15.03% Subset
Hybrid + K-means e2e smoke 1 4.43% 26 s clip

K-means auto-k uses silhouette-based k selection with single-speaker detection (no more 20-speaker predictions on 1-speaker files). It beats AHC by 4.65% DER on the full VoxConverse benchmark without any manual threshold tuning.


What makes it different

  • Automatic speaker count — K-means auto-k detects how many speakers are in the recording. No more guessing thresholds.
  • Single-speaker guardrail — embeddings too similar? Returns 1 speaker instead of hallucinating clusters.
  • Overlap-aware — PowersetSegmenter detects overlapping speech regions; embeddings are masked to exclude overlaps before clustering.
  • Streaming & batchOnlineDiarizer for real-time, OfflineDiarizer for files.
  • Cross-platform — Linux, macOS, Windows; x86_64 and aarch64.
  • Hardened — Miri (memory safety), Loom (concurrency), cargo-fuzz (4 targets), model signing (Minisign).

Architecture

┌─────────────┐     ┌─────────────────┐     ┌─────────────────┐
│ Audio Bytes │ --> │ Embedding       │ --> │ Speaker Cluster │ --> Turns
│ (f32 PCM)   │     │ Extractor       │     │ (AHC or K-means)│
└─────────────┘     └─────────────────┘     └─────────────────┘
       │                    │                       │
       v                    v                       v
  Powerset VAD      WeSpeaker ResNet34      Silhouette auto-k
  (10s windows,     (2s windows, 256-dim)   (pairwise cosine
   1s hop)                                  distance cache)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polyvoice-0.6.5.tar.gz (1.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polyvoice-0.6.5-cp314-cp314-macosx_11_0_arm64.whl (8.0 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

polyvoice-0.6.5-cp312-cp312-win_amd64.whl (8.3 MB view details)

Uploaded CPython 3.12Windows x86-64

polyvoice-0.6.5-cp312-cp312-manylinux_2_38_x86_64.whl (10.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.38+ x86-64

polyvoice-0.6.5-cp312-cp312-macosx_11_0_arm64.whl (9.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file polyvoice-0.6.5.tar.gz.

File metadata

  • Download URL: polyvoice-0.6.5.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for polyvoice-0.6.5.tar.gz
Algorithm Hash digest
SHA256 b02214dc2c930869bee6e7039d42cd38d302bfb869107866690b0f318b67192b
MD5 942190792485467853c5182e1c41204d
BLAKE2b-256 f486ef3f3102f297694425c81d35789209e6a2ce90457ec7e20a4bd579fe4f78

See more details on using hashes here.

File details

Details for the file polyvoice-0.6.5-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polyvoice-0.6.5-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 86a320010efa9b19d0b5a317528aae20c9fa42ab0f9eb767a4b94eabb6c4924c
MD5 b4d882a0415b84aa7a9433e9b0582a1a
BLAKE2b-256 ab992735c7f1ff47a72457767eff30725c4eb38c7458d3dc324a3fef6807b26d

See more details on using hashes here.

File details

Details for the file polyvoice-0.6.5-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: polyvoice-0.6.5-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 8.3 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polyvoice-0.6.5-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 d4ee0fecc3546c84df61694828f9947cf2a611ea9b02ec7dddd5b658c493d8b7
MD5 33e0de06877abb49127e744c74068e28
BLAKE2b-256 15d74a304c7571804ddb0f5697bcf10882e176f108acd93ac331c03239ccaae7

See more details on using hashes here.

Provenance

The following attestation bundles were made for polyvoice-0.6.5-cp312-cp312-win_amd64.whl:

Publisher: release.yml on ekhodzitsky/polyvoice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polyvoice-0.6.5-cp312-cp312-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for polyvoice-0.6.5-cp312-cp312-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 e26dccf6bbd10d55ae7dd001386e508b0cef2208a97254aa7dd663166983e3af
MD5 ff95053f46c3d409beb791ecba0e6041
BLAKE2b-256 03d88749908fb63d7374fdb35f8997bbb495e6b54f7d710d0358427108370d2b

See more details on using hashes here.

Provenance

The following attestation bundles were made for polyvoice-0.6.5-cp312-cp312-manylinux_2_38_x86_64.whl:

Publisher: release.yml on ekhodzitsky/polyvoice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polyvoice-0.6.5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polyvoice-0.6.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1b5d6d7660dfd87a742ddc720e4f7ae8d364283e3a923974ef4fbd2110a3c7b2
MD5 e1ba75b88a2a629da347091f87375c6c
BLAKE2b-256 677aa92e6a870adff0aa1638747bda1ce1e6273d4319db5333ba2e21909913f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for polyvoice-0.6.5-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on ekhodzitsky/polyvoice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page