Skip to main content

Personal voice verification toolkit using ECAPA-TDNN embeddings

Project description

VocalID: A Lightweight Voice Authentication Toolkit

VocalID is a compact and practical voice authentication library that combines ECAPA-TDNN embeddings with a simple classifier to verify user identity from audio recordings. It supports file-based verification and real-time microphone input. The project is designed to be easy to train, deploy, and# VocalID: A Lightweight Voice Authentication Toolkit

VocalID is a compact and practical voice authentication library that combines ECAPA-TDNN embeddings with a simple classifier to verify user identity from audio recordings. It supports file-based verification and real-time microphone input. The project is designed to be easy to train, deploy, and extend.


Features

  • ECAPA-TDNN embeddings using speechbrain/spkrec-ecapa-voxceleb
  • Training with positive (owner) and negative (impostor) audio samples
  • Evaluation with accuracy and classification metrics
  • Verification from audio files or live microphone input
  • CLI toolkit for training, evaluating, and verifying
  • Modular design with trainer, verifier, embeddings, config, and utilities
  • Simple model storage using pickle-based persistence
  • Full test suite included

How It Works

1. Audio Processing

Audio is loaded or recorded, resampled, and normalized.

2. Embedding Extraction

ECAPA-TDNN generates fixed-dimensional speaker embeddings. These embeddings represent unique speaker characteristics.

3. Feature Preparation

Positive and negative embeddings are labeled and fed into the trainer.

4. Classification Model

A simple Logistic Regression model is trained on the embeddings.

5. Verification

During verification:

  1. Extract embeddings for the new audio.
  2. Predict with trained classifier.
  3. Return a confidence score.
  4. Compare score with threshold from config.py.

Package Structure

VocalID
    └── voice_verifier/
        │
        ├── trainer.py         # Training logic, evaluation, model save/load
        ├── verifier.py        # File and waveform verification
        ├── embeddings.py      # ECAPA-TDNN embedding extraction
        ├── audio_utils.py     # Audio loading and microphone recording
        ├── config.py          # Threshold + ECAPA model configuration
        ├── model_store.py     # Model checkpoint loader
        ├── cli.py             # Command-line interface
    └── tests/                 # Full pytest suite
    └── examples/
    └── requirements.txt
    └── api/
        ├── app.py
    └── README.md

Components

VoiceTrainer

Functions: - train() - evaluate() - prepare_features() - save() - load()

VoiceVerifier

Methods: - verify_file(path) - verify_array(audio_tensor)

EmbeddingExtractor

  • embed_file(path)
  • embed_waveform(waveform, sr)

Audio Utilities

  • load_audio(path)
  • record_audio(seconds)

Installation

    pip install vocalid

Example usage script(Python)

Directory Structure Example

Assume your dataset looks like this:

Voice tip: Each voice sample of 5-6 seconds with different tone/ bg noise/ accent/ microphone

└── dataset/
    └── my_voice/               <-- positive class (your voice)
        sample1.wav
        sample2.wav
        sample3.wav
        sample4.wav
        
    └── other_voices/           <-- negative class(other's voices)
        voice1.wav
        voice2.wav
        voice3.wav
        voice4.wav

Full python script

from vocalid.trainer import VoiceTrainer
from vocalid.verifier import VoiceVerifier
from vocalid.audio_utils import load_audio
import glob

# 1. TRAINING THE MODEL

pos_files = glob.glob("dataset/my_voice/*.wav")
neg_files = glob.glob("dataset/other_voices/*.wav")

trainer = VoiceTrainer()
trainer.train(pos_files, neg_files, save_path="my_voice_model.pkl")

# (Optional) Check metrics printed by evaluate() in train()
print("Training complete. Model saved.")


# 2. EVALUATING THE TRAINED MODEL (Manually)

# This is useful if you want to evaluate after loading the model.
# Or you want to compute new metrics on a different test set.

# Example test data (can be same folders or separate ones)
test_pos = glob.glob("dataset/my_voice_test/*.wav")
test_neg = glob.glob("dataset/other_voices_test/*.wav")

metrics = trainer.evaluate(test_pos, test_neg)

print("Accuracy:", metrics["accuracy"])
print("Report:\n", metrics["report"])

# Example output:
# Classification report text
# Accuracy: 0.91


# 3. VERIFY A FILE

verifier = VoiceVerifier("my_voice_model.pkl")

to_verify = "verify_samples/unknown_voice.wav"
ok, score = verifier.verify_file(to_verify)

print(f"\nVerification result: {ok}, Score: {score:.3f}")
# ok = True means it matches your voice
# score is probability from the classifier


# 4. VERIFY LIVE MICROPHONE AUDIO (Windows supported)

# Record a short clip and verify
audio_tensor = trainer.record_audio(seconds=4)
ok, score = verifier.verify_array(audio_tensor)

print(f"Live verification: {ok}, Score: {score:.3f}")

Example Evaluate-Only Script

If someone just wants to evaluate the model later:

from vocalid.trainer import VoiceTrainer
import glob

trainer = VoiceTrainer()
trainer.load("my_voice_model.pkl")

test_pos = glob.glob("dataset/my_voice_test/*.wav")
test_neg = glob.glob("dataset/other_voices_test/*.wav")

metrics = trainer.evaluate(test_pos, test_neg)

print("Accuracy:", metrics["accuracy"])
print("Report:\n", metrics["report"])

CLI Commands

    vocalid train --positive my_voice --negative others --output model.pkl
    vocalid evaluate --model model.pkl
    vocalid verify audio.wav --model model.pkl
    vocalid live --model model.pkl --seconds 4

Use Cases

  • Personal voice unlock systems
  • Lightweight identity verification
  • Speaker recognition prototypes
  • Research experiments in speaker embeddings
  • Security analyses for spoof detection

Why It Matters

This toolkit allows developers and researchers to:

  • Build practical speaker authentication systems quickly
  • Learn how ECAPA embeddings work
  • Train custom voiceprints without heavy dependencies
  • Extend or plug into larger voice systems

Contributing

Pull requests are welcome.
Tests can be run with:

    pytest -v

Author

Muhammad Khubaib Ahmad\

AI/ML Engineer, Data Scientist and Voice Intelligence Researcher

Portfolio and Links


License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocalid-0.1.5.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vocalid-0.1.5-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file vocalid-0.1.5.tar.gz.

File metadata

  • Download URL: vocalid-0.1.5.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for vocalid-0.1.5.tar.gz
Algorithm Hash digest
SHA256 7185d79286ea8b3944de548b4aab86f41e7fbb82c7dbf44364552fa9fb8acf38
MD5 cfadc34b3f47f50adb929723b6ce3564
BLAKE2b-256 3db43ad7396903a168f8a1c7f88a598b10f8e4c79be4a7b28b90774681ca71ca

See more details on using hashes here.

File details

Details for the file vocalid-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: vocalid-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for vocalid-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2488af87ab676a5513990b18b591260c5d7afb65d36a3d8ce4d4f246fe8a683f
MD5 1a4c1d61db2ebda54a0cb22834133111
BLAKE2b-256 ec22af7c06ab6144e236da0e074ef00ef6503aac5c6f129542ff7689b4dba5d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page