Skip to main content

Pure-onnxruntime speaker embedding library — no torch at runtime

Project description

speakeronnx

Pure-onnxruntime speaker embedding library — no torch at runtime.

Extract speaker embeddings, compute cosine similarity, and verify speaker identity using ONNX-exported models downloaded automatically from HuggingFace.

Model collection: OpenVoiceOS/speaker-embeddings-onnx

Install

pip install speakeronnx

Optional high-quality resampling:

pip install speakeronnx soxr

Quick start

from speakeronnx import SpeakerEmbedder, cosine, verify

embedder = SpeakerEmbedder(model="wespeaker-resnet34")

alice1 = embedder.embed("alice_clip1.wav")
alice2 = embedder.embed("alice_clip2.wav")
bob    = embedder.embed("bob_clip1.wav")

print(cosine(alice1, alice2))   # e.g. 0.82  — same speaker
print(cosine(alice1, bob))      # e.g. 0.21  — different speaker

ok, score = verify(alice1, alice2, threshold=0.45)
print(ok, score)  # True 0.82

More examples in examples/.

CLI

speakeronnx list                              # list available models
speakeronnx embed clip.wav                    # extract embedding
speakeronnx verify a.wav b.wav               # same-speaker check (exit 0/1)
speakeronnx verify a.wav b.wav --threshold 0.5
speakeronnx embed clip.wav --model wespeaker-ecapa512

Full CLI reference in docs/cli.md.

Models

All 9 models are registered in MODEL_REGISTRY and downloaded on first use:

Alias Embed dim Frontend License
wespeaker-resnet34 256 fbank80 cc-by-4.0
wespeaker-ecapa512 192 fbank80 cc-by-4.0
wespeaker-resnet293 256 fbank80 cc-by-4.0
campplus 512 fbank80 cc-by-4.0
campplus-zh-en 192 fbank80 apache-2.0
eres2net 192 fbank80 apache-2.0
titanet-small 192 fbank80 cc-by-4.0
titanet-large 192 fbank80 cc-by-4.0
redimnet-b2 192 raw apache-2.0

Full model comparison and selection guide in docs/models.md.

Documentation

Document Description
docs/index.md Full getting-started guide
docs/models.md Model comparison, selection, frontend/layout details
docs/api.md Complete API reference
docs/cli.md CLI usage reference
docs/frontend.md Feature frontend (fbank80 vs raw) technical details
docs/advanced.md Custom models, GPU, threshold tuning

Examples

Script Description
examples/basic_embedding.py Extract embedding from a single WAV
examples/verify_speakers.py Verify two clips, try multiple thresholds
examples/compare_models.py Compare all models on same utterances
examples/batch_enrollment.py Enroll speakers from directories, match unknown
examples/custom_model.py Load a custom ONNX model from disk
examples/gpu_inference.py CUDA / CoreML inference

Tests

# Unit tests (mocked, no downloads, no network)
pytest tests/test_unit.py tests/test_audio.py tests/test_frontend.py \
      tests/test_embedder.py tests/test_cli.py tests/test_model_registry.py -v
| Alias | HF repo | License | Embed dim | Description |
|---|---|---|---|---|
| `wespeaker-resnet34` | [Wespeaker/wespeaker-voxceleb-resnet34-LM](https://huggingface.co/Wespeaker/wespeaker-voxceleb-resnet34-LM) | cc-by-4.0 | 256 | ResNet34 r-vector, VoxCeleb2 Dev  **recommended default** |
| `wespeaker-ecapa512` | [Wespeaker/wespeaker-ecapa-tdnn512-LM](https://huggingface.co/Wespeaker/wespeaker-ecapa-tdnn512-LM) | cc-by-4.0 | 192 | ECAPA-TDNN-512 x-vector, VoxCeleb2 Dev |
| `wespeaker-resnet293` | [Wespeaker/wespeaker-voxceleb-resnet293-LM](https://huggingface.co/Wespeaker/wespeaker-voxceleb-resnet293-LM) | cc-by-4.0 | 256 | ResNet293 r-vector  highest accuracy, 28M params |
| `campplus` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | cc-by-4.0 | 512 | CAM++ (D-TDNN backbone), VoxCeleb2 Dev |
| `campplus-zh-en` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | apache-2.0 | 192 | 3D-Speaker CAM++ multilingual (zh+en) |
| `eres2net` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | apache-2.0 | 192 | ERes2Net, VoxCeleb |
| `titanet-small` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | cc-by-4.0 | 192 | NVIDIA NeMo TitaNet-small (~40 MB) |
| `titanet-large` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | cc-by-4.0 | 192 | NVIDIA NeMo TitaNet-large (~101 MB) |
| `redimnet-b2` | [OpenVoiceOS/redimnet-b2-vox2-onnx](https://huggingface.co/OpenVoiceOS/redimnet-b2-vox2-onnx) | apache-2.0 | 192 | ReDimNet b2 (1.8M params), raw audio input |

# End-to-end tests (downloads models + generates TTS audio)
pytest tests/test_e2e.py -v -s

Use speakeronnx list to print descriptions and metadata for all registered models.

Feature frontend

  • fbank80 models: 80-dim log-Mel filterbank with per-utterance CMN, implemented in pure numpy. See docs/frontend.md.
  • raw models (redimnet-b2): raw 16 kHz waveform passed directly to ONNX (internal MelSpectrogram in the model).

Audio requirements

  • Mono PCM WAV, any bit depth (8/16/24/32-bit int, 32-bit float)
  • Any sample rate (resampled internally to 16 kHz)
  • Stereo files are downmixed to mono
  • Minimum ~1 second; recommended enrollment 5–30 seconds per speaker

Dependencies

  • onnxruntime
  • numpy
  • huggingface_hub
  • soxr (optional, for high-quality resampling)

Project links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speakeronnx-0.0.1a2.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speakeronnx-0.0.1a2-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file speakeronnx-0.0.1a2.tar.gz.

File metadata

  • Download URL: speakeronnx-0.0.1a2.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for speakeronnx-0.0.1a2.tar.gz
Algorithm Hash digest
SHA256 5d078b7407af66fea3bffc1c93ee49703dc887c59856f3c40ae20624b64f7cfc
MD5 fadb5cae6704d36eef40b1b6c5fc4e14
BLAKE2b-256 cd3ee47aa916b97e4f3d505b1add4f8fcf6980b9e673b7e644dd98249b640f34

See more details on using hashes here.

File details

Details for the file speakeronnx-0.0.1a2-py3-none-any.whl.

File metadata

  • Download URL: speakeronnx-0.0.1a2-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for speakeronnx-0.0.1a2-py3-none-any.whl
Algorithm Hash digest
SHA256 18c3a524e45d8151d84502953bc83095ff8886a280c56a05d879b67bf460a7ce
MD5 8f10b5f8c04527e45f54780b860c147c
BLAKE2b-256 d1cc7edb993116016b5f4d8de1ec462d699326095e1e40956fda47c4319516d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page