Pure-onnxruntime speaker embedding library — no torch at runtime

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

speakeronnx

Pure-onnxruntime speaker embedding library — no torch at runtime.

Extract speaker embeddings, compute cosine similarity, and verify speaker identity using ONNX-exported models downloaded automatically from HuggingFace.

Model collection: OpenVoiceOS/speaker-embeddings-onnx

Install

pip install speakeronnx

Optional high-quality resampling:

pip install speakeronnx soxr

Quick start

from speakeronnx import SpeakerEmbedder, cosine, verify

embedder = SpeakerEmbedder(model="wespeaker-resnet34")

alice1 = embedder.embed("alice_clip1.wav")
alice2 = embedder.embed("alice_clip2.wav")
bob    = embedder.embed("bob_clip1.wav")

print(cosine(alice1, alice2))   # e.g. 0.82  — same speaker
print(cosine(alice1, bob))      # e.g. 0.21  — different speaker

ok, score = verify(alice1, alice2, threshold=0.45)
print(ok, score)  # True 0.82

More examples in examples/.

CLI

speakeronnx list                              # list available models
speakeronnx embed clip.wav                    # extract embedding
speakeronnx verify a.wav b.wav               # same-speaker check (exit 0/1)
speakeronnx verify a.wav b.wav --threshold 0.5
speakeronnx embed clip.wav --model wespeaker-ecapa512

Full CLI reference in docs/cli.md.

Models

All 9 models are registered in MODEL_REGISTRY and downloaded on first use:

Alias	Embed dim	Frontend	License
`wespeaker-resnet34`	256	fbank80	cc-by-4.0
`wespeaker-ecapa512`	192	fbank80	cc-by-4.0
`wespeaker-resnet293`	256	fbank80	cc-by-4.0
`campplus`	512	fbank80	cc-by-4.0
`campplus-zh-en`	192	fbank80	apache-2.0
`eres2net`	192	fbank80	apache-2.0
`titanet-small`	192	fbank80	cc-by-4.0
`titanet-large`	192	fbank80	cc-by-4.0
`redimnet-b2`	192	raw	apache-2.0

Full model comparison and selection guide in docs/models.md.

Documentation

Document	Description
`docs/index.md`	Full getting-started guide
`docs/models.md`	Model comparison, selection, frontend/layout details
`docs/api.md`	Complete API reference
`docs/cli.md`	CLI usage reference
`docs/frontend.md`	Feature frontend (fbank80 vs raw) technical details
`docs/advanced.md`	Custom models, GPU, threshold tuning

Examples

Script	Description
`examples/basic_embedding.py`	Extract embedding from a single WAV
`examples/verify_speakers.py`	Verify two clips, try multiple thresholds
`examples/compare_models.py`	Compare all models on same utterances
`examples/batch_enrollment.py`	Enroll speakers from directories, match unknown
`examples/custom_model.py`	Load a custom ONNX model from disk
`examples/gpu_inference.py`	CUDA / CoreML inference

Tests

# Unit tests (mocked, no downloads, no network)
pytest tests/test_unit.py tests/test_audio.py tests/test_frontend.py \
      tests/test_embedder.py tests/test_cli.py tests/test_model_registry.py -v
| Alias | HF repo | License | Embed dim | Description |
|---|---|---|---|---|
| `wespeaker-resnet34` | [Wespeaker/wespeaker-voxceleb-resnet34-LM](https://huggingface.co/Wespeaker/wespeaker-voxceleb-resnet34-LM) | cc-by-4.0 | 256 | ResNet34 r-vector, VoxCeleb2 Dev — **recommended default** |
| `wespeaker-ecapa512` | [Wespeaker/wespeaker-ecapa-tdnn512-LM](https://huggingface.co/Wespeaker/wespeaker-ecapa-tdnn512-LM) | cc-by-4.0 | 192 | ECAPA-TDNN-512 x-vector, VoxCeleb2 Dev |
| `wespeaker-resnet293` | [Wespeaker/wespeaker-voxceleb-resnet293-LM](https://huggingface.co/Wespeaker/wespeaker-voxceleb-resnet293-LM) | cc-by-4.0 | 256 | ResNet293 r-vector — highest accuracy, 28M params |
| `campplus` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | cc-by-4.0 | 512 | CAM++ (D-TDNN backbone), VoxCeleb2 Dev |
| `campplus-zh-en` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | apache-2.0 | 192 | 3D-Speaker CAM++ multilingual (zh+en) |
| `eres2net` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | apache-2.0 | 192 | ERes2Net, VoxCeleb |
| `titanet-small` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | cc-by-4.0 | 192 | NVIDIA NeMo TitaNet-small (~40 MB) |
| `titanet-large` | [csukuangfj/speaker-embedding-models](https://huggingface.co/csukuangfj/speaker-embedding-models) | cc-by-4.0 | 192 | NVIDIA NeMo TitaNet-large (~101 MB) |
| `redimnet-b2` | [OpenVoiceOS/redimnet-b2-vox2-onnx](https://huggingface.co/OpenVoiceOS/redimnet-b2-vox2-onnx) | apache-2.0 | 192 | ReDimNet b2 (1.8M params), raw audio input |

# End-to-end tests (downloads models + generates TTS audio)
pytest tests/test_e2e.py -v -s

Use speakeronnx list to print descriptions and metadata for all registered models.

Feature frontend

fbank80 models: 80-dim log-Mel filterbank with per-utterance CMN, implemented in pure numpy. See docs/frontend.md.
raw models (redimnet-b2): raw 16 kHz waveform passed directly to ONNX (internal MelSpectrogram in the model).

Audio requirements

Mono PCM WAV, any bit depth (8/16/24/32-bit int, 32-bit float)
Any sample rate (resampled internally to 16 kHz)
Stereo files are downmixed to mono
Minimum ~1 second; recommended enrollment 5–30 seconds per speaker

Dependencies

onnxruntime
numpy
huggingface_hub
soxr (optional, for high-quality resampling)

Project links

GitHub: TigreGotico/speakeronnx
PyPI: pip install speakeronnx

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.1

Jun 13, 2026

0.0.1a2 pre-release

Jun 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speakeronnx-0.0.1.tar.gz (25.0 kB view details)

Uploaded Jun 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speakeronnx-0.0.1-py3-none-any.whl (30.1 kB view details)

Uploaded Jun 13, 2026 Python 3

File details

Details for the file speakeronnx-0.0.1.tar.gz.

File metadata

Download URL: speakeronnx-0.0.1.tar.gz
Upload date: Jun 13, 2026
Size: 25.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for speakeronnx-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`21c6b11a38309bf58eb7f6a838d91bc3a60cd918d92eccf206824a7231dda4a5`
MD5	`0dfdce714a9968d422da4cd652208318`
BLAKE2b-256	`235dcd923b2c1c2ef901945ee3f5fd39a1017d3fe6911155754679a20a520f3c`

See more details on using hashes here.

File details

Details for the file speakeronnx-0.0.1-py3-none-any.whl.

File metadata

Download URL: speakeronnx-0.0.1-py3-none-any.whl
Upload date: Jun 13, 2026
Size: 30.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for speakeronnx-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`56209f4e6f95518eb5bd592c5a828264c562f220309c998043a70efc44ae2146`
MD5	`993f1aa815aeb9678ec1f16467dba003`
BLAKE2b-256	`fa9f93d04fe2fa7ca0a9f641b549dd4e35713b0d3bc7c62fb5e3835a3fbe481d`

See more details on using hashes here.

speakeronnx 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

speakeronnx

Install

Quick start

CLI

Models

Documentation

Examples

Tests

Feature frontend

Audio requirements

Dependencies

Project links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes