Speaker diarization — who spoke when. Rust + ONNX, no Python runtime overhead.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

polyvoice

Speaker diarization for Rust — who spoke when, without Python.

Silero VAD + WeSpeaker embeddings + AHC clustering in a single Pipeline::run() call.

Input:  14 seconds of two-speaker audio (16 kHz mono WAV)
Output: SPEAKER_00: 0.10s -  7.60s
        SPEAKER_01: 8.10s - 14.10s

Quick start

1. Add the dependency

[dependencies]
polyvoice = { version = "0.5", features = ["onnx"] }

2. Download models

bash scripts/download-models.sh
# Downloads WeSpeaker ResNet34 (25 MB) and Silero VAD v5 (2.2 MB) to models/

3. Run the pipeline

use polyvoice::{
    Pipeline, DiarizationConfig, VadConfig,
    FbankOnnxExtractor, SileroVad,
};
use std::path::Path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load models
    let extractor = FbankOnnxExtractor::new(
        Path::new("models/wespeaker_resnet34.onnx"),
        256, // embedding dim
        4,   // ONNX session pool size
    )?;
    let mut vad = SileroVad::new(Path::new("models/silero_vad.onnx"), 512)?;

    // Configure and run
    let pipeline = Pipeline::new(
        DiarizationConfig::default(),
        VadConfig::default(),
    );
    let (samples, _sr) = polyvoice::wav::read_wav(Path::new("meeting.wav"))?;
    let result = pipeline.run(&samples, &extractor, &mut vad)?;

    for turn in &result.turns {
        println!("{}: {:.2}s - {:.2}s", turn.speaker, turn.time.start, turn.time.end);
    }
    Ok(())
}

Python

cd python
maturin develop --release

PyPI package coming soon.

import polyvoice

pipeline = polyvoice.Pipeline("models/")
turns = pipeline("meeting.wav")

for turn in turns:
    print(f"{turn.speaker}: {turn.start:.1f}s - {turn.end:.1f}s")

CLI

cargo install polyvoice --features cli

polyvoice download-models
polyvoice diarize meeting.wav
polyvoice diarize meeting.wav --format json
polyvoice diarize meeting.wav --format rttm --max-speakers 4

How it works

WAV / PCM audio (16 kHz mono)
       |
       v
+-------------+     +------------------+     +---------+
|  Silero VAD |---->| WeSpeaker        |---->|   AHC   |---> Speaker turns
|  (speech    |     | ResNet34         |     | cluster |
|   regions)  |     | (256-d embed.)   |     |         |
+-------------+     +------------------+     +---------+
                     fbank + CMVN           cosine similarity
                     lock-free pool         threshold merging

VAD detects speech regions, skipping silence. WeSpeaker extracts 256-dimensional speaker embeddings from log-mel filterbank features (80-bin, CMVN-normalized). AHC clusters embeddings by cosine similarity into speaker groups. The Pipeline wires it all together.

Comparison with pyannote

	polyvoice	pyannote
Language	Rust	Python
Runtime	ONNX Runtime	PyTorch
GIL-free	Yes	No
Binary size	~30 MB (with models)	~2 GB (torch + models)
Deploy	Single binary / C FFI	Python env + pip
Concurrent sessions	Lock-free session pool	Thread-limited
Streaming	`OnlineDiarizer` built-in	Third-party wrappers

pyannote is the gold standard for accuracy. polyvoice trades some accuracy for deployment simplicity: no Python runtime, no GPU required, ~30 MB total.

Minimum Supported Rust Version (MSRV)

1.85 (Rust 2024 edition).

Accuracy (DER benchmarks)

Evaluated with 0.25s collar on standard diarization benchmarks:

VoxConverse (232 files, 43.5 hours — broadcast, meetings, interviews)

System	DER	Miss	FA	Confusion	Speed
polyvoice (AHC, t=0.4)	16.4%	3.9%	3.2%	9.3%	10.6x RT (CPU)
pyannote 3.0	~11%	—	—	—	~1x RT (GPU)

AMI (16 meetings, 9 hours — meeting room recordings)

System	DER	Miss	FA	Confusion	Speed
polyvoice (AHC, t=0.4)	27.5%	17.7%	2.2%	7.6%	7x RT (CPU)
pyannote 3.0	~18%	—	—	—	~1x RT (GPU)
Simple i-vector + AHC	~33%	—	—	—	—

polyvoice delivers ~80% of pyannote's accuracy at 10x the speed on CPU alone — no GPU, no Python, ~30 MB total. The accuracy gap comes from neural end-to-end training and overlap-aware resegmentation, which polyvoice doesn't do yet.

# Reproduce benchmarks
bash scripts/download-ami-test.sh
cargo run --release --features cli --bin polyvoice-bench -- data/ami-test

bash scripts/download-voxconverse-test.sh
cargo run --release --features cli --bin polyvoice-bench -- data/voxconverse-test --threshold 0.4

Features

Pipeline API — Pipeline::run() for one-call diarization with VAD + embeddings + clustering.
Online & Offline — OnlineDiarizer for real-time streaming, OfflineDiarizer for batch files.
ONNX-powered — WeSpeaker and ECAPA-TDNN extractors with 80-bin log-mel fbank + CMVN.
Lock-free session pool — crossbeam-queue backed pool for concurrent ONNX inference.
Silero VAD — integrated voice activity detection with stateful LSTM context.
Overlap detection — find regions where multiple speakers talk simultaneously.
Word alignment — assign speaker IDs to transcript words by timestamp.
Python bindings — pip install polyvoice, 3-line API via PyO3/maturin.
CLI — polyvoice diarize meeting.wav with text/json/rttm output.
C FFI — drop-in .so/.dylib/.dll for Go, Node.js, C++ callers.
Safety verified — Miri (memory), Loom (concurrency), cargo-fuzz (inputs), across Linux/macOS/Windows.

Configuration

use polyvoice::{DiarizationConfig, VadConfig, SampleRate};

let config = DiarizationConfig {
    threshold: 0.4,           // cosine similarity threshold
    max_speakers: 64,         // hard speaker limit
    window_secs: 1.5,         // analysis window
    hop_secs: 0.75,           // sliding step
    min_speech_secs: 0.25,    // discard shorter segments
    max_gap_secs: 0.5,        // merge same-speaker gaps under 500 ms
    sample_rate: SampleRate::new(16000).unwrap(),
};

let vad_config = VadConfig {
    frame_size: 512,          // Silero VAD chunk size (32 ms at 16 kHz)
    threshold: 0.5,           // speech probability threshold
    min_silence_ms: 300.0,    // minimum silence to split segments
};

Streaming (real-time)

use polyvoice::{OnlineDiarizer, DiarizationConfig, DummyExtractor};

let config = DiarizationConfig::default();
let mut diarizer = OnlineDiarizer::new(config);
let extractor = DummyExtractor::new(256);

// In your audio callback:
# let chunk = vec![0.0f32; 4800];
let segments = diarizer.feed(&chunk, &extractor).unwrap();
for seg in segments {
    println!("Speaker {:?} at {:.2}s", seg.speaker, seg.time.start);
}

Verification

Check	Tool
Unsafe memory safety	Miri (nightly CI)
Concurrency correctness	Loom model-checking
Input fuzzing	cargo-fuzz (4 targets)
API stability	cargo-semver-checks
Cross-platform	Ubuntu, macOS, Windows CI
Dependency audit	cargo-audit

Roadmap

WeSpeaker + ECAPA-TDNN ONNX extractors
Silero VAD integration
Agglomerative hierarchical clustering (AHC)
Pipeline API (VAD + embeddings + AHC)
C FFI bindings
Miri / Loom / fuzz verification
Cross-platform CI
Python bindings (PyO3 / maturin)
CLI tool (polyvoice diarize / download-models)
DER benchmarks on AMI (27.5%) and VoxConverse (16.4%), 0.25s collar
Spectral clustering backend
PLDA scoring backend

Contributing

See CONTRIBUTING.md.

Changelog

See CHANGELOG.md.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ekhodzitsky

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.0a3 pre-release

May 10, 2026

This version

0.5.2

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

polyvoice-0.5.2-cp312-cp312-win_amd64.whl (7.2 MB view details)

Uploaded May 6, 2026 CPython 3.12Windows x86-64

polyvoice-0.5.2-cp312-cp312-manylinux_2_38_x86_64.whl (8.7 MB view details)

Uploaded May 6, 2026 CPython 3.12manylinux: glibc 2.38+ x86-64

polyvoice-0.5.2-cp312-cp312-macosx_11_0_arm64.whl (7.9 MB view details)

Uploaded May 6, 2026 CPython 3.12macOS 11.0+ ARM64

polyvoice-0.5.2-cp311-cp311-macosx_11_0_arm64.whl (7.9 MB view details)

Uploaded May 6, 2026 CPython 3.11macOS 11.0+ ARM64

File details

Details for the file polyvoice-0.5.2-cp312-cp312-win_amd64.whl.

File metadata

Download URL: polyvoice-0.5.2-cp312-cp312-win_amd64.whl
Upload date: May 6, 2026
Size: 7.2 MB
Tags: CPython 3.12, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polyvoice-0.5.2-cp312-cp312-win_amd64.whl
Algorithm	Hash digest
SHA256	`7b1e69a438546da511d40b67217ae13848834a1d6b19ce2b6a8aa32d30cb0487`
MD5	`1e478c17a8c3147f299dee0855cbd39b`
BLAKE2b-256	`e6b6c57c41894462ae459c14cceaa364ef16b4891c7bab850ab37e694eb427cc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for polyvoice-0.5.2-cp312-cp312-win_amd64.whl:

Publisher: release.yml on ekhodzitsky/polyvoice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: polyvoice-0.5.2-cp312-cp312-win_amd64.whl
- Subject digest: 7b1e69a438546da511d40b67217ae13848834a1d6b19ce2b6a8aa32d30cb0487
- Sigstore transparency entry: 1449724654
- Sigstore integration time: May 6, 2026
Source repository:
- Permalink: ekhodzitsky/polyvoice@c77e7c7956663d6831bc74c87a4e54a860485410
- Branch / Tag: refs/tags/v0.5.2
- Owner: https://github.com/ekhodzitsky
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c77e7c7956663d6831bc74c87a4e54a860485410
- Trigger Event: push

File details

Details for the file polyvoice-0.5.2-cp312-cp312-manylinux_2_38_x86_64.whl.

File metadata

Download URL: polyvoice-0.5.2-cp312-cp312-manylinux_2_38_x86_64.whl
Upload date: May 6, 2026
Size: 8.7 MB
Tags: CPython 3.12, manylinux: glibc 2.38+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polyvoice-0.5.2-cp312-cp312-manylinux_2_38_x86_64.whl
Algorithm	Hash digest
SHA256	`7278124d4c8c4ff7830776cdf6a58b67f04989e26e5563c6fbb689bd68f6a57b`
MD5	`4da803f63838bdf341b1ca25cf2b077e`
BLAKE2b-256	`0f40df466185cdb099934837ef1acbb4f87d3030656461cac793aed3691bfb59`

See more details on using hashes here.

Provenance

The following attestation bundles were made for polyvoice-0.5.2-cp312-cp312-manylinux_2_38_x86_64.whl:

Publisher: release.yml on ekhodzitsky/polyvoice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: polyvoice-0.5.2-cp312-cp312-manylinux_2_38_x86_64.whl
- Subject digest: 7278124d4c8c4ff7830776cdf6a58b67f04989e26e5563c6fbb689bd68f6a57b
- Sigstore transparency entry: 1449724671
- Sigstore integration time: May 6, 2026
Source repository:
- Permalink: ekhodzitsky/polyvoice@c77e7c7956663d6831bc74c87a4e54a860485410
- Branch / Tag: refs/tags/v0.5.2
- Owner: https://github.com/ekhodzitsky
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c77e7c7956663d6831bc74c87a4e54a860485410
- Trigger Event: push

File details

Details for the file polyvoice-0.5.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

Download URL: polyvoice-0.5.2-cp312-cp312-macosx_11_0_arm64.whl
Upload date: May 6, 2026
Size: 7.9 MB
Tags: CPython 3.12, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polyvoice-0.5.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`3421253760986dd17820ae9b0d980e3afdf38070ac70fb2e9d5298b1f359ac84`
MD5	`7602680caf7644667f914a9e48f6fef9`
BLAKE2b-256	`3a30c61ad44c33c81b17cc048ea4726b10539912e50b5e54bece2a3d627786cc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for polyvoice-0.5.2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on ekhodzitsky/polyvoice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: polyvoice-0.5.2-cp312-cp312-macosx_11_0_arm64.whl
- Subject digest: 3421253760986dd17820ae9b0d980e3afdf38070ac70fb2e9d5298b1f359ac84
- Sigstore transparency entry: 1449724701
- Sigstore integration time: May 6, 2026
Source repository:
- Permalink: ekhodzitsky/polyvoice@c77e7c7956663d6831bc74c87a4e54a860485410
- Branch / Tag: refs/tags/v0.5.2
- Owner: https://github.com/ekhodzitsky
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c77e7c7956663d6831bc74c87a4e54a860485410
- Trigger Event: push

File details

Details for the file polyvoice-0.5.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

Download URL: polyvoice-0.5.2-cp311-cp311-macosx_11_0_arm64.whl
Upload date: May 6, 2026
Size: 7.9 MB
Tags: CPython 3.11, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for polyvoice-0.5.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`57b5aa3f056f8a285ba314870929bdf0e788af55027606041c0d93ab27c6572a`
MD5	`178324342ba6fcc375e872c221568e9b`
BLAKE2b-256	`58fa4914ebc3ef4dfb29d2626c4553f329eff8fb3288077f8be8c307e05757d6`

See more details on using hashes here.

polyvoice 0.5.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

polyvoice

Quick start

1. Add the dependency

2. Download models

3. Run the pipeline

Python

CLI

How it works

Comparison with pyannote

Minimum Supported Rust Version (MSRV)

Accuracy (DER benchmarks)

VoxConverse (232 files, 43.5 hours — broadcast, meetings, interviews)

AMI (16 meetings, 9 hours — meeting room recordings)

Features

Configuration

Streaming (real-time)

Verification

Roadmap

Contributing

Changelog

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes