Pure-onnxruntime speaker embedding library — no torch at runtime
Project description
speakeronnx
Pure-onnxruntime speaker embedding library — no torch at runtime.
Extract speaker embeddings, compute cosine similarity, and verify speaker identity using ONNX-exported models downloaded automatically from HuggingFace.
Model collection: OpenVoiceOS/speaker-embeddings-onnx
Install
pip install speakeronnx
Optional high-quality resampling:
pip install speakeronnx soxr
Quick start
from speakeronnx import SpeakerEmbedder, cosine, verify
embedder = SpeakerEmbedder(model="wespeaker-resnet34")
alice1 = embedder.embed("alice_clip1.wav")
alice2 = embedder.embed("alice_clip2.wav")
bob = embedder.embed("bob_clip1.wav")
print(cosine(alice1, alice2)) # e.g. 0.82 — same speaker
print(cosine(alice1, bob)) # e.g. 0.21 — different speaker
ok, score = verify(alice1, alice2, threshold=0.45)
print(ok, score) # True 0.82
More examples in examples/.
CLI
speakeronnx list # list available models
speakeronnx embed clip.wav # extract embedding
speakeronnx verify a.wav b.wav # same-speaker check (exit 0/1)
speakeronnx verify a.wav b.wav --threshold 0.5
speakeronnx embed clip.wav --model wespeaker-ecapa512
Full CLI reference in docs/cli.md.
Models
All 9 models are registered in MODEL_REGISTRY and downloaded on first use:
| Alias | Embed dim | Frontend | License |
|---|---|---|---|
wespeaker-resnet34 |
256 | fbank80 | cc-by-4.0 |
wespeaker-ecapa512 |
192 | fbank80 | cc-by-4.0 |
wespeaker-resnet293 |
256 | fbank80 | cc-by-4.0 |
campplus |
512 | fbank80 | cc-by-4.0 |
campplus-zh-en |
192 | fbank80 | apache-2.0 |
eres2net |
192 | fbank80 | apache-2.0 |
titanet-small |
192 | fbank80 | cc-by-4.0 |
titanet-large |
192 | fbank80 | cc-by-4.0 |
redimnet-b2 |
192 | raw | apache-2.0 |
Full model comparison and selection guide in docs/models.md.
Documentation
| Document | Description |
|---|---|
docs/index.md |
Full getting-started guide |
docs/models.md |
Model comparison, selection, frontend/layout details |
docs/api.md |
Complete API reference |
docs/cli.md |
CLI usage reference |
docs/frontend.md |
Feature frontend (fbank80 vs raw) technical details |
docs/advanced.md |
Custom models, GPU, threshold tuning |
Examples
| Script | Description |
|---|---|
examples/basic_embedding.py |
Extract embedding from a single WAV |
examples/verify_speakers.py |
Verify two clips, try multiple thresholds |
examples/compare_models.py |
Compare all models on same utterances |
examples/batch_enrollment.py |
Enroll speakers from directories, match unknown |
examples/custom_model.py |
Load a custom ONNX model from disk |
examples/gpu_inference.py |
CUDA / CoreML inference |
Tests
# Unit tests (mocked, no downloads, no network)
pytest tests/test_unit.py tests/test_audio.py tests/test_frontend.py \
tests/test_embedder.py tests/test_cli.py tests/test_model_registry.py -v
| Alias | HF repo | License | Embed dim | Description |
|---|---|---|---|---|
| `wespeaker-resnet34` | [Wespeaker/wespeaker-voxceleb-resnet34-LM](https://huggingface.co/Wespeaker/wespeaker-voxceleb-resnet34-LM) | cc-by-4.0 | 256 | ResNet34 r-vector, VoxCeleb2 Dev — **recommended default** |
| `wespeaker-ecapa512` | [Wespeaker/wespeaker-ecapa-tdnn512-LM](https://huggingface.co/Wespeaker/wespeaker-ecapa-tdnn512-LM) | cc-by-4.0 | 192 | ECAPA-TDNN-512 x-vector, VoxCeleb2 Dev |
| `wespeaker-resnet293` | [Wespeaker/wespeaker-voxceleb-resnet293-LM](https://huggingface.co/Wespeaker/wespeaker-voxceleb-resnet293-LM) | cc-by-4.0 | 256 | ResNet293 r-vector — highest accuracy, 28M params |
| `campplus` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | cc-by-4.0 | 512 | CAM++ (D-TDNN backbone), VoxCeleb2 Dev |
| `campplus-zh-en` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | apache-2.0 | 192 | 3D-Speaker CAM++ multilingual (zh+en) |
| `eres2net` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | apache-2.0 | 192 | ERes2Net, VoxCeleb |
| `titanet-small` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | cc-by-4.0 | 192 | NVIDIA NeMo TitaNet-small (~40 MB) |
| `titanet-large` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | cc-by-4.0 | 192 | NVIDIA NeMo TitaNet-large (~101 MB) |
| `redimnet-b2` | [OpenVoiceOS/redimnet-b2-vox2-onnx](https://huggingface.co/OpenVoiceOS/redimnet-b2-vox2-onnx) | apache-2.0 | 192 | ReDimNet b2 (1.8M params), raw audio input |
# End-to-end tests (downloads models + generates TTS audio)
pytest tests/test_e2e.py -v -s
Use speakeronnx list to print descriptions and metadata for all registered models.
Feature frontend
- fbank80 models: 80-dim log-Mel filterbank with per-utterance CMN,
implemented in pure numpy. See
docs/frontend.md. - raw models (redimnet-b2): raw 16 kHz waveform passed directly to ONNX (internal MelSpectrogram in the model).
Audio requirements
- Mono PCM WAV, any bit depth (8/16/24/32-bit int, 32-bit float)
- Any sample rate (resampled internally to 16 kHz)
- Stereo files are downmixed to mono
- Minimum ~1 second; recommended enrollment 5–30 seconds per speaker
Dependencies
onnxruntimenumpyhuggingface_hubsoxr(optional, for high-quality resampling)
Project links
- GitHub: TigreGotico/speakeronnx
- PyPI:
pip install speakeronnx
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speakeronnx-0.0.1a2.tar.gz.
File metadata
- Download URL: speakeronnx-0.0.1a2.tar.gz
- Upload date:
- Size: 25.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d078b7407af66fea3bffc1c93ee49703dc887c59856f3c40ae20624b64f7cfc
|
|
| MD5 |
fadb5cae6704d36eef40b1b6c5fc4e14
|
|
| BLAKE2b-256 |
cd3ee47aa916b97e4f3d505b1add4f8fcf6980b9e673b7e644dd98249b640f34
|
File details
Details for the file speakeronnx-0.0.1a2-py3-none-any.whl.
File metadata
- Download URL: speakeronnx-0.0.1a2-py3-none-any.whl
- Upload date:
- Size: 30.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18c3a524e45d8151d84502953bc83095ff8886a280c56a05d879b67bf460a7ce
|
|
| MD5 |
8f10b5f8c04527e45f54780b860c147c
|
|
| BLAKE2b-256 |
d1cc7edb993116016b5f4d8de1ec462d699326095e1e40956fda47c4319516d0
|