VocalID is an open-source Python library for voice authentication using ECAPA-TDNN speaker embeddings. Train, evaluate, and verify speaker identity from audio files or live microphone input to use voice as biometric to unlock apps, applications and systems.
Project description
VocalID: A Lightweight Voice Authentication Toolkit
VocalID is a practical and lightweight voice authentication library built around ECAPA-TDNN speaker embeddings and a simple classification layer. It lets you train your own voice model, evaluate its performance, and verify identities from recorded or live audio to use voice as a biometric to unlock apps, applications and for privacy. The goal is to make voice verification simple to run, easy to extend, and stable across devices.
Features
- ECAPA-TDNN speaker embeddings (
speechbrain/spkrec-ecapa-voxceleb) - Easy training workflow for positive (owner) and negative samples
- Evaluation with accuracy and a full classification report
- File-based verification and optional live microphone verification
- Clean CLI for training, testing and verification
- Modular, readable codebase
- Simple model storage using pickle
- Test suite included
- Simple FastAPI server for verification
How It Works
1. Audio Processing Audio is loaded or recorded, resampled to the target rate, converted to mono and padded to a minimum length.
2. Embedding Extraction We extract fixed-dimensional embeddings using ECAPA-TDNN. These embeddings capture speaker-specific characteristics.
3. Training A logistic regression classifier is trained on positive and negative embeddings.
4. Verification When verifying a sample:
- Extract the embedding
- Run it through the model
- Get a probability score
- Compare with the threshold from
config.py
Package Structure
VocalID/
│
├── voice_verifier/
│ ├── trainer.py # Training, evaluation, saving, loading
│ ├── verifier.py # Verification from file or tensor
│ ├── embeddings.py # ECAPA-TDNN embedding extractor
│ ├── audio_utils.py # Audio loading / microphone recording
│ ├── config.py # Threshold, sample rate, model config
│ ├── model_store.py # Pickle storage helpers
│ ├── cli.py # CLI interface
│
├── tests/ # Pytest suite
├── examples/ # Example scripts
├── requirements.txt
├── api/app.py # Optional API server example
└── README.md
Installation
For windows users
pip install vocalid
Or install from source:
git clone https://github.com/Khubaib8281/VocalID.git
cd VocalID
pip install -e .
For Linux Users
Also install;
apt-get install -y libportaudio2
Since, soundevice relies on libportaudio2
Dataset Layout
Your dataset should be organized as:
dataset/
│
├── my_voice/ # Positive samples (your voice)
│ sample1.wav
│ sample2.wav
│ ...
│
└── other_voices/ # Negative samples (others)
voice1.wav
voice2.wav
...
Each sample should ideally be 4–6 seconds with varied tone, distance, and background conditions.
Example Usage (Python)
Train
from vocalid.trainer import VoiceTrainer
import glob
pos_files = glob.glob("dataset/my_voice/*.wav")
neg_files = glob.glob("dataset/other_voices/*.wav")
trainer = VoiceTrainer()
trainer.train(pos_files, neg_files, save_path="my_voice_model.pkl")
Evaluate
trainer.load("my_voice_model.pkl")
test_pos = glob.glob("dataset/my_voice_test/*.wav")
test_neg = glob.glob("dataset/other_voices_test/*.wav")
metrics = trainer.evaluate(test_pos, test_neg)
print("Accuracy:", metrics["accuracy"])
print(metrics["report"])
Verify a file
from vocalid.verifier import VoiceVerifier
verifier = VoiceVerifier("my_voice_model.pkl")
ok, score = verifier.verify_file("verify_samples/unknown.wav")
print(ok, score)
Verify live audio
ok, score = verifier.verify_live(audio_tensor)
print(ok, score)
Live recording only works on systems with a real microphone. It will not run in cloud notebooks.
CLI Usage
Train:
vocalid train --positive my_voice --negative others --output model.pkl
Evaluate:
vocalid evaluate --model model.pkl --positive my_voice --negative others
Verify a file:
vocalid verify sample.wav --model model.pkl
Live verification:
vocalid live --model model.pkl --seconds 4
Use Cases
- Personal voice-unlock systems for apps, applications and systems
- Lightweight speaker verification
- Research in speaker embeddings
- Prototyping identity checks
- Classroom or research demonstrations
- Testing spoofing and adversarial audio
Why This Matters
VocalID helps developers learn how practical speaker verification works without dealing with heavy frameworks. The library focuses on transparency, modularity and simplicity:
- Clear separation of embedding extraction and classification
- Easy to swap in a different classifier
- Works on CPU
- No special hardware needed for training
Contributing
Pull requests are welcome. To run tests:
pytest -v
Feel free to open issues for bugs, improvement ideas, or feature requests.
Author
Muhammad Khubaib Ahmad AI/ML Engineer | Data Scientist | Voice Intelligence Researcher
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vocalid-0.2.0.tar.gz.
File metadata
- Download URL: vocalid-0.2.0.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a62e96b747a04c61fa84583288cdab6722f97a8a188bd2776cbaebbf24f9fcd3
|
|
| MD5 |
1666a32fbef7ed8a701421b02e4cd81b
|
|
| BLAKE2b-256 |
872b6c0020e56bd936fdbdf8228cddf011ce84548516d790d0ca8d6b7bf1219a
|
File details
Details for the file vocalid-0.2.0-py3-none-any.whl.
File metadata
- Download URL: vocalid-0.2.0-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3910db9fc613a2b260437dbe8cec7276df6cf369a799d059923a647b47763a18
|
|
| MD5 |
bd692b22d515110ef11e161af5b0c994
|
|
| BLAKE2b-256 |
431509e93b2e622029d33b00c496cb5ec253e30b8f879006b24b290361094dcf
|