Skip to main content

Personal voice verification toolkit using ECAPA-TDNN embeddings

Project description

VocalID: A Lightweight Voice Authentication Toolkit

VocalID is a practical and lightweight voice authentication library built around ECAPA-TDNN speaker embeddings and a simple classification layer. It lets you train your own voice model, evaluate its performance, and verify identities from recorded or live audio. The goal is to make voice verification simple to run, easy to extend, and stable across devices.


Features

  • ECAPA-TDNN speaker embeddings (speechbrain/spkrec-ecapa-voxceleb)
  • Easy training workflow for positive (owner) and negative samples
  • Evaluation with accuracy and a full classification report
  • File-based verification and optional live microphone verification
  • Clean CLI for training, testing and verification
  • Modular, readable codebase
  • Simple model storage using pickle
  • Test suite included

How It Works

1. Audio Processing Audio is loaded or recorded, resampled to the target rate, converted to mono and padded to a minimum length.

2. Embedding Extraction We extract fixed-dimensional embeddings using ECAPA-TDNN. These embeddings capture speaker-specific characteristics.

3. Training A logistic regression classifier is trained on positive and negative embeddings.

4. Verification When verifying a sample:

  1. Extract the embedding
  2. Run it through the model
  3. Get a probability score
  4. Compare with the threshold from config.py

Package Structure

VocalID/
│
├── voice_verifier/
│   ├── trainer.py        # Training, evaluation, saving, loading
│   ├── verifier.py       # Verification from file or tensor
│   ├── embeddings.py     # ECAPA-TDNN embedding extractor
│   ├── audio_utils.py    # Audio loading / microphone recording
│   ├── config.py         # Threshold, sample rate, model config
│   ├── model_store.py    # Pickle storage helpers
│   ├── cli.py            # CLI interface
│
├── tests/                # Pytest suite
├── examples/             # Example scripts
├── requirements.txt
├── api/app.py            # Optional API server example
└── README.md

Installation

pip install vocalid

Or install from source:

git clone https://github.com/Khubaib8281/VocalID.git
cd VocalID
pip install -e .

Dataset Layout

Your dataset should be organized as:

dataset/
│
├── my_voice/            # Positive samples (your voice)
│   sample1.wav
│   sample2.wav
│   ...
│
└── other_voices/        # Negative samples (others)
    voice1.wav
    voice2.wav
    ...

Each sample should ideally be 4–6 seconds with varied tone, distance, and background conditions.


Example Usage (Python)

Train

from vocalid.trainer import VoiceTrainer
import glob

pos_files = glob.glob("dataset/my_voice/*.wav")
neg_files = glob.glob("dataset/other_voices/*.wav")

trainer = VoiceTrainer()
trainer.train(pos_files, neg_files, save_path="my_voice_model.pkl")

Evaluate

trainer.load("my_voice_model.pkl")

test_pos = glob.glob("dataset/my_voice_test/*.wav")
test_neg = glob.glob("dataset/other_voices_test/*.wav")

metrics = trainer.evaluate(test_pos, test_neg)
print("Accuracy:", metrics["accuracy"])
print(metrics["report"])

Verify a file

from vocalid.verifier import VoiceVerifier

verifier = VoiceVerifier("my_voice_model.pkl")
ok, score = verifier.verify_file("verify_samples/unknown.wav")

print(ok, score)

Verify live audio

audio_tensor = trainer.record_audio(seconds=4)
ok, score = verifier.verify_array(audio_tensor)
print(ok, score)

Live recording only works on systems with a real microphone. It will not run in cloud notebooks.


CLI Usage

Train:

vocalid train --positive my_voice --negative others --output model.pkl

Evaluate:

vocalid evaluate --model model.pkl --positive my_voice --negative others

Verify a file:

vocalid verify sample.wav --model model.pkl

Live verification:

vocalid live --model model.pkl --seconds 4

Use Cases

  • Personal voice-unlock systems
  • Lightweight speaker verification
  • Research in speaker embeddings
  • Prototyping identity checks
  • Classroom or research demonstrations
  • Testing spoofing and adversarial audio

Why This Matters

VocalID helps developers learn how practical speaker verification works without dealing with heavy frameworks. The library focuses on transparency, modularity and simplicity:

  • Clear separation of embedding extraction and classification
  • Easy to swap in a different classifier
  • Works on CPU
  • No special hardware needed for training

Contributing

Pull requests are welcome. To run tests:

pytest -v

Feel free to open issues for bugs, improvement ideas, or feature requests.


Author

Muhammad Khubaib Ahmad AI/ML Engineer | Data Scientist | Voice Intelligence Researcher


License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocalid-0.1.9.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vocalid-0.1.9-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file vocalid-0.1.9.tar.gz.

File metadata

  • Download URL: vocalid-0.1.9.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for vocalid-0.1.9.tar.gz
Algorithm Hash digest
SHA256 04336a53f12130eaa3a3a50d386243d7bbf8edefc9f13b6afef18075f07670fd
MD5 1169183f429826b40c084a57ee878beb
BLAKE2b-256 9d9d61ae2256d5878c2de32d2bc832c466c18d452ddb5fdd51701a7b1fcbe283

See more details on using hashes here.

File details

Details for the file vocalid-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: vocalid-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for vocalid-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 1eca26795d2be398c967fb2192606c33cc12f7ee9d7f8a7a9133e449c21912dc
MD5 b1826367e282210000a686f9f50b7545
BLAKE2b-256 4dbbaab5f061c7a6a28aac9612d774db5f613d81c9ebb99e5d76c22303314287

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page