Speaker diarization — who spoke when. Rust + ONNX, no Python runtime overhead.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ekhodzitsky

These details have not been verified by PyPI

Project description

polyvoice

Speaker diarization for Rust — who spoke when, without Python.

Production-ready speaker diarization that runs on CPU, fits in 30 MB, and outperforms AHC clustering with automatic K-means speaker count detection.

Speaker_0: 0.0s - 12.3s
Speaker_1: 14.1s - 28.7s
Speaker_0: 31.2s - 45.0s

At a glance

	polyvoice	pyannote 3.1	whisperX
VoxConverse DER	14.12%	~12%	~15%
Model size	~30 MB	~100 MB	~1 GB
Runtime	CPU only	GPU recommended	GPU required
Dependencies	Zero (ONNX)	PyTorch + ONNX	PyTorch + faster-whisper
Languages	Rust / Python / C / CLI	Python only	Python only
Streaming	Yes	No	No

~80% of pyannote's accuracy at 10× less RAM and no GPU.

Install

# Rust
cargo add polyvoice --features onnx

# Python
pip install polyvoice

# CLI
cargo install polyvoice --features cli

Quick start — Rust

use polyvoice::models::ModelRegistry;
use polyvoice::pipeline_v2::hybrid::HybridPipeline;
use polyvoice::segmentation::PowersetSegmenter;
use polyvoice::embedder::ResNet34Adapter;
use polyvoice::clusterer::KMeansClusterer;
use polyvoice::types::SampleRate;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Models auto-download on first run
    let registry = ModelRegistry::default()?;
    let models = registry.ensure_for_profile(polyvoice::types::Profile::Balanced)?;

    let segmenter = PowersetSegmenter::new(&models.segmenter_path)?;
    let embedder = ResNet34Adapter::new(&models.embedder_path, 4)?;
    let clusterer = KMeansClusterer::new(20); // auto-k via silhouette

    let pipeline = HybridPipeline::new(
        Box::new(segmenter),
        Box::new(embedder),
        Box::new(clusterer),
    );

    let (samples, _sr) = polyvoice::wav::read_wav("meeting.wav")?;
    let result = pipeline.run(&samples, SampleRate::new(16000).unwrap())?;

    for turn in &result.turns {
        println!("{}: {:.1}s - {:.1}s", turn.speaker, turn.time.start, turn.time.end);
    }
    Ok(())
}

Quick start — Python

import polyvoice

pipeline = polyvoice.Pipeline.balanced("models/")
result = pipeline.run(samples, sample_rate=16000)

for turn in result["turns"]:
    print(f"{turn['speaker']}: {turn['start']:.1f}s - {turn['end']:.1f}s")

Quick start — CLI

# Download models once
polyvoice download-models --profile balanced

# Diarize
polyvoice diarize meeting.wav --output meeting.rttm

Benchmarks

Pipeline	Dataset	Files	DER	Notes
Hybrid + K-means	VoxConverse-test	232	14.12%	Auto-k, no threshold tuning
Hybrid + AHC	VoxConverse-test	232	18.77%	Manual threshold 0.40
Legacy (Silero + AHC)	VoxConverse-test	232	~14%	Baseline pipeline
Hybrid + K-means	VoxConverse-test	10	13.48%	Subset
Hybrid + AHC	VoxConverse-test	10	15.03%	Subset
Hybrid + K-means	e2e smoke	1	4.43%	26 s clip

K-means auto-k uses silhouette-based k selection with single-speaker detection (no more 20-speaker predictions on 1-speaker files). It beats AHC by 4.65% DER on the full VoxConverse benchmark without any manual threshold tuning.

What makes it different

Automatic speaker count — K-means auto-k detects how many speakers are in the recording. No more guessing thresholds.
Single-speaker guardrail — embeddings too similar? Returns 1 speaker instead of hallucinating clusters.
Overlap-aware — PowersetSegmenter detects overlapping speech regions; embeddings are masked to exclude overlaps before clustering.
Streaming & batch — OnlineDiarizer for real-time, OfflineDiarizer for files.
Cross-platform — Linux, macOS, Windows; x86_64 and aarch64.
Hardened — Miri (memory safety), Loom (concurrency), cargo-fuzz (4 targets), model signing (Minisign).

Architecture

┌─────────────┐     ┌─────────────────┐     ┌─────────────────┐
│ Audio Bytes │ --> │ Embedding       │ --> │ Speaker Cluster │ --> Turns
│ (f32 PCM)   │     │ Extractor       │     │ (AHC or K-means)│
└─────────────┘     └─────────────────┘     └─────────────────┘
       │                    │                       │
       v                    v                       v
  Powerset VAD      WeSpeaker ResNet34      Silhouette auto-k
  (10s windows,     (2s windows, 256-dim)   (pairwise cosine
   1s hop)                                  distance cache)

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ekhodzitsky

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.9.0

Jun 22, 2026

0.8.0

Jun 22, 2026

0.7.0

Jun 14, 2026

0.6.9

Jun 13, 2026

0.6.8

Jun 1, 2026

0.6.7

May 27, 2026

0.6.6

May 25, 2026

This version

0.6.5

May 21, 2026

0.6.3

May 19, 2026

0.6.2

May 18, 2026

0.6.1

May 18, 2026

0.6.0

May 18, 2026

0.6.0a3 pre-release

May 10, 2026

0.5.2

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polyvoice-0.6.5.tar.gz (1.1 MB view details)

Uploaded May 21, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

polyvoice-0.6.5-cp314-cp314-macosx_11_0_arm64.whl (8.0 MB view details)

Uploaded May 21, 2026 CPython 3.14macOS 11.0+ ARM64

polyvoice-0.6.5-cp312-cp312-win_amd64.whl (8.3 MB view details)

Uploaded May 21, 2026 CPython 3.12Windows x86-64

polyvoice-0.6.5-cp312-cp312-manylinux_2_38_x86_64.whl (10.1 MB view details)

Uploaded May 21, 2026 CPython 3.12manylinux: glibc 2.38+ x86-64

polyvoice-0.6.5-cp312-cp312-macosx_11_0_arm64.whl (9.1 MB view details)

Uploaded May 21, 2026 CPython 3.12macOS 11.0+ ARM64

File details

Details for the file polyvoice-0.6.5.tar.gz.

File metadata

Download URL: polyvoice-0.6.5.tar.gz
Upload date: May 21, 2026
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for polyvoice-0.6.5.tar.gz
Algorithm	Hash digest
SHA256	`b02214dc2c930869bee6e7039d42cd38d302bfb869107866690b0f318b67192b`
MD5	`942190792485467853c5182e1c41204d`
BLAKE2b-256	`f486ef3f3102f297694425c81d35789209e6a2ce90457ec7e20a4bd579fe4f78`

See more details on using hashes here.

File details

Details for the file polyvoice-0.6.5-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

Download URL: polyvoice-0.6.5-cp314-cp314-macosx_11_0_arm64.whl
Upload date: May 21, 2026
Size: 8.0 MB
Tags: CPython 3.14, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for polyvoice-0.6.5-cp314-cp314-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`86a320010efa9b19d0b5a317528aae20c9fa42ab0f9eb767a4b94eabb6c4924c`
MD5	`b4d882a0415b84aa7a9433e9b0582a1a`
BLAKE2b-256	`ab992735c7f1ff47a72457767eff30725c4eb38c7458d3dc324a3fef6807b26d`

See more details on using hashes here.

File details

Details for the file polyvoice-0.6.5-cp312-cp312-win_amd64.whl.

File metadata

Download URL: polyvoice-0.6.5-cp312-cp312-win_amd64.whl
Upload date: May 21, 2026
Size: 8.3 MB
Tags: CPython 3.12, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polyvoice-0.6.5-cp312-cp312-win_amd64.whl
Algorithm	Hash digest
SHA256	`d4ee0fecc3546c84df61694828f9947cf2a611ea9b02ec7dddd5b658c493d8b7`
MD5	`33e0de06877abb49127e744c74068e28`
BLAKE2b-256	`15d74a304c7571804ddb0f5697bcf10882e176f108acd93ac331c03239ccaae7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for polyvoice-0.6.5-cp312-cp312-win_amd64.whl:

Publisher: release.yml on ekhodzitsky/polyvoice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: polyvoice-0.6.5-cp312-cp312-win_amd64.whl
- Subject digest: d4ee0fecc3546c84df61694828f9947cf2a611ea9b02ec7dddd5b658c493d8b7
- Sigstore transparency entry: 1591074655
- Sigstore integration time: May 21, 2026
Source repository:
- Permalink: ekhodzitsky/polyvoice@d43992250ae51575a69c0dd34a4d472c57d61990
- Branch / Tag: refs/tags/v0.6.5
- Owner: https://github.com/ekhodzitsky
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d43992250ae51575a69c0dd34a4d472c57d61990
- Trigger Event: push

File details

Details for the file polyvoice-0.6.5-cp312-cp312-manylinux_2_38_x86_64.whl.

File metadata

Download URL: polyvoice-0.6.5-cp312-cp312-manylinux_2_38_x86_64.whl
Upload date: May 21, 2026
Size: 10.1 MB
Tags: CPython 3.12, manylinux: glibc 2.38+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polyvoice-0.6.5-cp312-cp312-manylinux_2_38_x86_64.whl
Algorithm	Hash digest
SHA256	`e26dccf6bbd10d55ae7dd001386e508b0cef2208a97254aa7dd663166983e3af`
MD5	`ff95053f46c3d409beb791ecba0e6041`
BLAKE2b-256	`03d88749908fb63d7374fdb35f8997bbb495e6b54f7d710d0358427108370d2b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for polyvoice-0.6.5-cp312-cp312-manylinux_2_38_x86_64.whl:

Publisher: release.yml on ekhodzitsky/polyvoice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: polyvoice-0.6.5-cp312-cp312-manylinux_2_38_x86_64.whl
- Subject digest: e26dccf6bbd10d55ae7dd001386e508b0cef2208a97254aa7dd663166983e3af
- Sigstore transparency entry: 1591074634
- Sigstore integration time: May 21, 2026
Source repository:
- Permalink: ekhodzitsky/polyvoice@d43992250ae51575a69c0dd34a4d472c57d61990
- Branch / Tag: refs/tags/v0.6.5
- Owner: https://github.com/ekhodzitsky
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d43992250ae51575a69c0dd34a4d472c57d61990
- Trigger Event: push

File details

Details for the file polyvoice-0.6.5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

Download URL: polyvoice-0.6.5-cp312-cp312-macosx_11_0_arm64.whl
Upload date: May 21, 2026
Size: 9.1 MB
Tags: CPython 3.12, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polyvoice-0.6.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`1b5d6d7660dfd87a742ddc720e4f7ae8d364283e3a923974ef4fbd2110a3c7b2`
MD5	`e1ba75b88a2a629da347091f87375c6c`
BLAKE2b-256	`677aa92e6a870adff0aa1638747bda1ce1e6273d4319db5333ba2e21909913f9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for polyvoice-0.6.5-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on ekhodzitsky/polyvoice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: polyvoice-0.6.5-cp312-cp312-macosx_11_0_arm64.whl
- Subject digest: 1b5d6d7660dfd87a742ddc720e4f7ae8d364283e3a923974ef4fbd2110a3c7b2
- Sigstore transparency entry: 1591074644
- Sigstore integration time: May 21, 2026
Source repository:
- Permalink: ekhodzitsky/polyvoice@d43992250ae51575a69c0dd34a4d472c57d61990
- Branch / Tag: refs/tags/v0.6.5
- Owner: https://github.com/ekhodzitsky
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d43992250ae51575a69c0dd34a4d472c57d61990
- Trigger Event: push

polyvoice 0.6.5

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

polyvoice

At a glance

Install

Quick start — Rust

Quick start — Python

Quick start — CLI

Benchmarks

What makes it different

Architecture

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance