Skip to main content

Personal voice verification toolkit using ECAPA-TDNN embeddings

Project description

VocalID: A Lightweight Voice Authentication Toolkit

VocalID is a compact and practical voice authentication library that combines ECAPA-TDNN embeddings with a simple classifier to verify user identity from audio recordings. It supports file-based verification and real-time microphone input. The project is designed to be easy to train, deploy, and# VocalID: A Lightweight Voice Authentication Toolkit

VocalID is a compact and practical voice authentication library that combines ECAPA-TDNN embeddings with a simple classifier to verify user identity from audio recordings. It supports file-based verification and real-time microphone input. The project is designed to be easy to train, deploy, and extend.


Features

  • ECAPA-TDNN embeddings using speechbrain/spkrec-ecapa-voxceleb
  • Training with positive (owner) and negative (impostor) audio samples
  • Evaluation with accuracy and classification metrics
  • Verification from audio files or live microphone input
  • CLI toolkit for training, evaluating, and verifying
  • Modular design with trainer, verifier, embeddings, config, and utilities
  • Simple model storage using pickle-based persistence
  • Full test suite included

How It Works

1. Audio Processing

Audio is loaded or recorded, resampled, and normalized.

2. Embedding Extraction

ECAPA-TDNN generates fixed-dimensional speaker embeddings. These embeddings represent unique speaker characteristics.

3. Feature Preparation

Positive and negative embeddings are labeled and fed into the trainer.

4. Classification Model

A simple Logistic Regression model is trained on the embeddings.

5. Verification

During verification:

  1. Extract embeddings for the new audio.
  2. Predict with trained classifier.
  3. Return a confidence score.
  4. Compare score with threshold from config.py.

Package Structure

VocalID
    └── voice_verifier/
        │
        ├── trainer.py         # Training logic, evaluation, model save/load
        ├── verifier.py        # File and waveform verification
        ├── embeddings.py      # ECAPA-TDNN embedding extraction
        ├── audio_utils.py     # Audio loading and microphone recording
        ├── config.py          # Threshold + ECAPA model configuration
        ├── model_store.py     # Model checkpoint loader
        ├── cli.py             # Command-line interface
    └── tests/                 # Full pytest suite
    └── examples/
    └── requirements.txt
    └── api/
        ├── app.py
    └── README.md

Components

VoiceTrainer

Functions: - train() - evaluate() - prepare_features() - save() - load()

VoiceVerifier

Methods: - verify_file(path) - verify_array(audio_tensor)

EmbeddingExtractor

  • embed_file(path)
  • embed_waveform(waveform, sr)

Audio Utilities

  • load_audio(path)
  • record_audio(seconds)

Installation

    pip install vocalid

Example usage script(Python)

Directory Structure Example

Assume your dataset looks like this:

Voice tip: Each voice sample of 5-6 seconds with different tone/ bg noise/ accent/ microphone

└── dataset/
    └── my_voice/               <-- positive class (your voice)
        sample1.wav
        sample2.wav
        sample3.wav
        sample4.wav
        
    └── other_voices/           <-- negative class(other's voices)
        voice1.wav
        voice2.wav
        voice3.wav
        voice4.wav

Full python script

from vocalid.trainer import VoiceTrainer
from vocalid.verifier import VoiceVerifier
from vocalid.audio_utils import load_audio
import glob

# 1. TRAINING THE MODEL

pos_files = glob.glob("dataset/my_voice/*.wav")
neg_files = glob.glob("dataset/other_voices/*.wav")

trainer = VoiceTrainer()
trainer.train(pos_files, neg_files, save_path="my_voice_model.pkl")

# (Optional) Check metrics printed by evaluate() in train()
print("Training complete. Model saved.")


# 2. EVALUATING THE TRAINED MODEL (Manually)

# This is useful if you want to evaluate after loading the model.
# Or you want to compute new metrics on a different test set.

# Example test data (can be same folders or separate ones)
test_pos = glob.glob("dataset/my_voice_test/*.wav")
test_neg = glob.glob("dataset/other_voices_test/*.wav")

metrics = trainer.evaluate(test_pos, test_neg)

print("Accuracy:", metrics["accuracy"])
print("Report:\n", metrics["report"])

# Example output:
# Classification report text
# Accuracy: 0.91


# 3. VERIFY A FILE

verifier = VoiceVerifier("my_voice_model.pkl")

to_verify = "verify_samples/unknown_voice.wav"
ok, score = verifier.verify_file(to_verify)

print(f"\nVerification result: {ok}, Score: {score:.3f}")
# ok = True means it matches your voice
# score is probability from the classifier


# 4. VERIFY LIVE MICROPHONE AUDIO (Windows supported)

# Record a short clip and verify
audio_tensor = trainer.record_audio(seconds=4)
ok, score = verifier.verify_array(audio_tensor)

print(f"Live verification: {ok}, Score: {score:.3f}")

Example Evaluate-Only Script

If someone just wants to evaluate the model later:

from vocalid.trainer import VoiceTrainer
import glob

trainer = VoiceTrainer()
trainer.load("my_voice_model.pkl")

test_pos = glob.glob("dataset/my_voice_test/*.wav")
test_neg = glob.glob("dataset/other_voices_test/*.wav")

metrics = trainer.evaluate(test_pos, test_neg)

print("Accuracy:", metrics["accuracy"])
print("Report:\n", metrics["report"])

CLI Commands

    vocalid train --positive my_voice --negative others --output model.pkl
    vocalid evaluate --model model.pkl
    vocalid verify audio.wav --model model.pkl
    vocalid live --model model.pkl --seconds 4

Use Cases

  • Personal voice unlock systems
  • Lightweight identity verification
  • Speaker recognition prototypes
  • Research experiments in speaker embeddings
  • Security analyses for spoof detection

Why It Matters

This toolkit allows developers and researchers to:

  • Build practical speaker authentication systems quickly
  • Learn how ECAPA embeddings work
  • Train custom voiceprints without heavy dependencies
  • Extend or plug into larger voice systems

Contributing

Pull requests are welcome.
Tests can be run with:

    pytest -v

Author

Muhammad Khubaib Ahmad\

AI/ML Engineer, Data Scientist and Voice Intelligence Researcher

Portfolio and Links


License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocalid-0.1.8.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vocalid-0.1.8-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file vocalid-0.1.8.tar.gz.

File metadata

  • Download URL: vocalid-0.1.8.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for vocalid-0.1.8.tar.gz
Algorithm Hash digest
SHA256 8650165040ea7a566efdcc487ed579fa0e2ee9617975bc7215c551414a2d2c7f
MD5 4200ec0d30802f093362727efc47845e
BLAKE2b-256 42d082ea257b24ae97ef4ec534c523080e54c4cf1f73260a9b7643304efe8460

See more details on using hashes here.

File details

Details for the file vocalid-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: vocalid-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for vocalid-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 2437dcad315f2d91dcb4ac2f2040019695ac9f11e1178ff32c81a0e643d9cd8e
MD5 2124803612d4a5f918c33032ce806996
BLAKE2b-256 338a76031a1c936a720c92819e4aa6a9477554e841286d5c740f8e63567812bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page