Speaker diarization — who spoke when. Rust + ONNX, no Python runtime overhead.
Project description
polyvoice
Speaker diarization for Rust — who spoke when, without Python.
Production-ready speaker diarization that runs on CPU, fits in 30 MB, and outperforms AHC clustering with automatic K-means speaker count detection.
Speaker_0: 0.0s - 12.3s
Speaker_1: 14.1s - 28.7s
Speaker_0: 31.2s - 45.0s
At a glance
| polyvoice | pyannote 3.1 | whisperX | |
|---|---|---|---|
| VoxConverse DER | 14.12% | ~12% | ~15% |
| Model size | ~30 MB | ~100 MB | ~1 GB |
| Runtime | CPU only | GPU recommended | GPU required |
| Dependencies | Zero (ONNX) | PyTorch + ONNX | PyTorch + faster-whisper |
| Languages | Rust / Python / C / CLI | Python only | Python only |
| Streaming | Yes | No | No |
~80% of pyannote's accuracy at 10× less RAM and no GPU.
Install
# Rust
cargo add polyvoice --features onnx
# Python
pip install polyvoice
# CLI
cargo install polyvoice --features cli
Quick start — Rust
use polyvoice::models::ModelRegistry;
use polyvoice::pipeline_v2::hybrid::HybridPipeline;
use polyvoice::segmentation::PowersetSegmenter;
use polyvoice::embedder::ResNet34Adapter;
use polyvoice::clusterer::KMeansClusterer;
use polyvoice::types::SampleRate;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Models auto-download on first run
let registry = ModelRegistry::default()?;
let models = registry.ensure_for_profile(polyvoice::types::Profile::Balanced)?;
let segmenter = PowersetSegmenter::new(&models.segmenter_path)?;
let embedder = ResNet34Adapter::new(&models.embedder_path, 4)?;
let clusterer = KMeansClusterer::new(20); // auto-k via silhouette
let pipeline = HybridPipeline::new(
Box::new(segmenter),
Box::new(embedder),
Box::new(clusterer),
);
let (samples, _sr) = polyvoice::wav::read_wav("meeting.wav")?;
let result = pipeline.run(&samples, SampleRate::new(16000).unwrap())?;
for turn in &result.turns {
println!("{}: {:.1}s - {:.1}s", turn.speaker, turn.time.start, turn.time.end);
}
Ok(())
}
Quick start — Python
import polyvoice
pipeline = polyvoice.Pipeline.balanced("models/")
result = pipeline.run(samples, sample_rate=16000)
for turn in result["turns"]:
print(f"{turn['speaker']}: {turn['start']:.1f}s - {turn['end']:.1f}s")
Quick start — CLI
# Download models once
polyvoice download-models --profile balanced
# Diarize
polyvoice diarize meeting.wav --output meeting.rttm
Benchmarks
| Pipeline | Dataset | Files | DER | Notes |
|---|---|---|---|---|
| Hybrid + K-means | VoxConverse-test | 232 | 14.12% | Auto-k, no threshold tuning |
| Hybrid + AHC | VoxConverse-test | 232 | 18.77% | Manual threshold 0.40 |
| Legacy (Silero + AHC) | VoxConverse-test | 232 | ~14% | Baseline pipeline |
| Hybrid + K-means | VoxConverse-test | 10 | 13.48% | Subset |
| Hybrid + AHC | VoxConverse-test | 10 | 15.03% | Subset |
| Hybrid + K-means | e2e smoke | 1 | 4.43% | 26 s clip |
K-means auto-k uses silhouette-based k selection with single-speaker detection (no more 20-speaker predictions on 1-speaker files). It beats AHC by 4.65% DER on the full VoxConverse benchmark without any manual threshold tuning.
What makes it different
- Automatic speaker count — K-means auto-k detects how many speakers are in the recording. No more guessing thresholds.
- Single-speaker guardrail — embeddings too similar? Returns 1 speaker instead of hallucinating clusters.
- Overlap-aware — PowersetSegmenter detects overlapping speech regions; embeddings are masked to exclude overlaps before clustering.
- Streaming & batch —
OnlineDiarizerfor real-time,OfflineDiarizerfor files. - Cross-platform — Linux, macOS, Windows; x86_64 and aarch64.
- Hardened — Miri (memory safety), Loom (concurrency), cargo-fuzz (4 targets), model signing (Minisign).
Architecture
┌─────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Audio Bytes │ --> │ Embedding │ --> │ Speaker Cluster │ --> Turns
│ (f32 PCM) │ │ Extractor │ │ (AHC or K-means)│
└─────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
v v v
Powerset VAD WeSpeaker ResNet34 Silhouette auto-k
(10s windows, (2s windows, 256-dim) (pairwise cosine
1s hop) distance cache)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polyvoice-0.6.5.tar.gz.
File metadata
- Download URL: polyvoice-0.6.5.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b02214dc2c930869bee6e7039d42cd38d302bfb869107866690b0f318b67192b
|
|
| MD5 |
942190792485467853c5182e1c41204d
|
|
| BLAKE2b-256 |
f486ef3f3102f297694425c81d35789209e6a2ce90457ec7e20a4bd579fe4f78
|
File details
Details for the file polyvoice-0.6.5-cp314-cp314-macosx_11_0_arm64.whl.
File metadata
- Download URL: polyvoice-0.6.5-cp314-cp314-macosx_11_0_arm64.whl
- Upload date:
- Size: 8.0 MB
- Tags: CPython 3.14, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86a320010efa9b19d0b5a317528aae20c9fa42ab0f9eb767a4b94eabb6c4924c
|
|
| MD5 |
b4d882a0415b84aa7a9433e9b0582a1a
|
|
| BLAKE2b-256 |
ab992735c7f1ff47a72457767eff30725c4eb38c7458d3dc324a3fef6807b26d
|
File details
Details for the file polyvoice-0.6.5-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: polyvoice-0.6.5-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 8.3 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4ee0fecc3546c84df61694828f9947cf2a611ea9b02ec7dddd5b658c493d8b7
|
|
| MD5 |
33e0de06877abb49127e744c74068e28
|
|
| BLAKE2b-256 |
15d74a304c7571804ddb0f5697bcf10882e176f108acd93ac331c03239ccaae7
|
Provenance
The following attestation bundles were made for polyvoice-0.6.5-cp312-cp312-win_amd64.whl:
Publisher:
release.yml on ekhodzitsky/polyvoice
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polyvoice-0.6.5-cp312-cp312-win_amd64.whl -
Subject digest:
d4ee0fecc3546c84df61694828f9947cf2a611ea9b02ec7dddd5b658c493d8b7 - Sigstore transparency entry: 1591074655
- Sigstore integration time:
-
Permalink:
ekhodzitsky/polyvoice@d43992250ae51575a69c0dd34a4d472c57d61990 -
Branch / Tag:
refs/tags/v0.6.5 - Owner: https://github.com/ekhodzitsky
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d43992250ae51575a69c0dd34a4d472c57d61990 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polyvoice-0.6.5-cp312-cp312-manylinux_2_38_x86_64.whl.
File metadata
- Download URL: polyvoice-0.6.5-cp312-cp312-manylinux_2_38_x86_64.whl
- Upload date:
- Size: 10.1 MB
- Tags: CPython 3.12, manylinux: glibc 2.38+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e26dccf6bbd10d55ae7dd001386e508b0cef2208a97254aa7dd663166983e3af
|
|
| MD5 |
ff95053f46c3d409beb791ecba0e6041
|
|
| BLAKE2b-256 |
03d88749908fb63d7374fdb35f8997bbb495e6b54f7d710d0358427108370d2b
|
Provenance
The following attestation bundles were made for polyvoice-0.6.5-cp312-cp312-manylinux_2_38_x86_64.whl:
Publisher:
release.yml on ekhodzitsky/polyvoice
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polyvoice-0.6.5-cp312-cp312-manylinux_2_38_x86_64.whl -
Subject digest:
e26dccf6bbd10d55ae7dd001386e508b0cef2208a97254aa7dd663166983e3af - Sigstore transparency entry: 1591074634
- Sigstore integration time:
-
Permalink:
ekhodzitsky/polyvoice@d43992250ae51575a69c0dd34a4d472c57d61990 -
Branch / Tag:
refs/tags/v0.6.5 - Owner: https://github.com/ekhodzitsky
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d43992250ae51575a69c0dd34a4d472c57d61990 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polyvoice-0.6.5-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: polyvoice-0.6.5-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 9.1 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b5d6d7660dfd87a742ddc720e4f7ae8d364283e3a923974ef4fbd2110a3c7b2
|
|
| MD5 |
e1ba75b88a2a629da347091f87375c6c
|
|
| BLAKE2b-256 |
677aa92e6a870adff0aa1638747bda1ce1e6273d4319db5333ba2e21909913f9
|
Provenance
The following attestation bundles were made for polyvoice-0.6.5-cp312-cp312-macosx_11_0_arm64.whl:
Publisher:
release.yml on ekhodzitsky/polyvoice
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polyvoice-0.6.5-cp312-cp312-macosx_11_0_arm64.whl -
Subject digest:
1b5d6d7660dfd87a742ddc720e4f7ae8d364283e3a923974ef4fbd2110a3c7b2 - Sigstore transparency entry: 1591074644
- Sigstore integration time:
-
Permalink:
ekhodzitsky/polyvoice@d43992250ae51575a69c0dd34a4d472c57d61990 -
Branch / Tag:
refs/tags/v0.6.5 - Owner: https://github.com/ekhodzitsky
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d43992250ae51575a69c0dd34a4d472c57d61990 -
Trigger Event:
push
-
Statement type: