Skip to main content

Personal voice verification toolkit using ECAPA-TDNN embeddings

Project description

VocalID: A Lightweight Voice Authentication Toolkit

VocalID is a compact and practical voice authentication library that combines ECAPA-TDNN embeddings with a simple classifier to verify user identity from audio recordings. It supports file-based verification and real-time microphone input. The project is designed to be easy to train, deploy, and# VocalID: A Lightweight Voice Authentication Toolkit

VocalID is a compact and practical voice authentication library that combines ECAPA-TDNN embeddings with a simple classifier to verify user identity from audio recordings. It supports file-based verification and real-time microphone input. The project is designed to be easy to train, deploy, and extend.


Features

  • ECAPA-TDNN embeddings using speechbrain/spkrec-ecapa-voxceleb
  • Training with positive (owner) and negative (impostor) audio samples
  • Evaluation with accuracy and classification metrics
  • Verification from audio files or live microphone input
  • CLI toolkit for training, evaluating, and verifying
  • Modular design with trainer, verifier, embeddings, config, and utilities
  • Simple model storage using pickle-based persistence
  • Full test suite included

How It Works

1. Audio Processing

Audio is loaded or recorded, resampled, and normalized.

2. Embedding Extraction

ECAPA-TDNN generates fixed-dimensional speaker embeddings. These embeddings represent unique speaker characteristics.

3. Feature Preparation

Positive and negative embeddings are labeled and fed into the trainer.

4. Classification Model

A simple Logistic Regression model is trained on the embeddings.

5. Verification

During verification:

  1. Extract embeddings for the new audio.
  2. Predict with trained classifier.
  3. Return a confidence score.
  4. Compare score with threshold from config.py.

Package Structure

VocalID
    └── voice_verifier/
        │
        ├── trainer.py         # Training logic, evaluation, model save/load
        ├── verifier.py        # File and waveform verification
        ├── embeddings.py      # ECAPA-TDNN embedding extraction
        ├── audio_utils.py     # Audio loading and microphone recording
        ├── config.py          # Threshold + ECAPA model configuration
        ├── model_store.py     # Model checkpoint loader
        ├── cli.py             # Command-line interface
    └── tests/                 # Full pytest suite
    └── examples/
    └── requirements.txt
    └── api/
        ├── app.py
    └── README.md

Components

VoiceTrainer

Functions: - train() - evaluate() - prepare_features() - save() - load()

VoiceVerifier

Methods: - verify_file(path) - verify_array(audio_tensor)

EmbeddingExtractor

  • embed_file(path)
  • embed_waveform(waveform, sr)

Audio Utilities

  • load_audio(path)
  • record_audio(seconds)

Installation

    pip install vocalid

Example usage script(Python)

Directory Structure Example

Assume your dataset looks like this:

Voice tip: Each voice sample of 5-6 seconds with different tone/ bg noise/ accent/ microphone

└── dataset/
    └── my_voice/               <-- positive class (your voice)
        sample1.wav
        sample2.wav
        sample3.wav
        sample4.wav
        
    └── other_voices/           <-- negative class(other's voices)
        voice1.wav
        voice2.wav
        voice3.wav
        voice4.wav

Full python script

from vocalid.trainer import VoiceTrainer
from vocalid.verifier import VoiceVerifier
from vocalid.audio_utils import load_audio
import glob

# 1. TRAINING THE MODEL

pos_files = glob.glob("dataset/my_voice/*.wav")
neg_files = glob.glob("dataset/other_voices/*.wav")

trainer = VoiceTrainer()
trainer.train(pos_files, neg_files, save_path="my_voice_model.pkl")

# (Optional) Check metrics printed by evaluate() in train()
print("Training complete. Model saved.")


# 2. EVALUATING THE TRAINED MODEL (Manually)

# This is useful if you want to evaluate after loading the model.
# Or you want to compute new metrics on a different test set.

# Example test data (can be same folders or separate ones)
test_pos = glob.glob("dataset/my_voice_test/*.wav")
test_neg = glob.glob("dataset/other_voices_test/*.wav")

metrics = trainer.evaluate(test_pos, test_neg)

print("Accuracy:", metrics["accuracy"])
print("Report:\n", metrics["report"])

# Example output:
# Classification report text
# Accuracy: 0.91


# 3. VERIFY A FILE

verifier = VoiceVerifier("my_voice_model.pkl")

to_verify = "verify_samples/unknown_voice.wav"
ok, score = verifier.verify_file(to_verify)

print(f"\nVerification result: {ok}, Score: {score:.3f}")
# ok = True means it matches your voice
# score is probability from the classifier


# 4. VERIFY LIVE MICROPHONE AUDIO (Windows supported)

# Record a short clip and verify
audio_tensor = trainer.record_audio(seconds=4)
ok, score = verifier.verify_array(audio_tensor)

print(f"Live verification: {ok}, Score: {score:.3f}")

Example Evaluate-Only Script

If someone just wants to evaluate the model later:

from vocalid.trainer import VoiceTrainer
import glob

trainer = VoiceTrainer()
trainer.load("my_voice_model.pkl")

test_pos = glob.glob("dataset/my_voice_test/*.wav")
test_neg = glob.glob("dataset/other_voices_test/*.wav")

metrics = trainer.evaluate(test_pos, test_neg)

print("Accuracy:", metrics["accuracy"])
print("Report:\n", metrics["report"])

CLI Commands

    vocalid train --positive my_voice --negative others --output model.pkl
    vocalid evaluate --model model.pkl
    vocalid verify audio.wav --model model.pkl
    vocalid live --model model.pkl --seconds 4

Use Cases

  • Personal voice unlock systems
  • Lightweight identity verification
  • Speaker recognition prototypes
  • Research experiments in speaker embeddings
  • Security analyses for spoof detection

Why It Matters

This toolkit allows developers and researchers to:

  • Build practical speaker authentication systems quickly
  • Learn how ECAPA embeddings work
  • Train custom voiceprints without heavy dependencies
  • Extend or plug into larger voice systems

Contributing

Pull requests are welcome.
Tests can be run with:

    pytest -v

Author

Muhammad Khubaib Ahmad\

AI/ML Engineer, Data Scientist and Voice Intelligence Researcher

Portfolio and Links


License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocalid-0.1.6.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vocalid-0.1.6-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file vocalid-0.1.6.tar.gz.

File metadata

  • Download URL: vocalid-0.1.6.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for vocalid-0.1.6.tar.gz
Algorithm Hash digest
SHA256 424b1a90f36124374a6806ec123b524c9d62feb7b655e6b6731c127d68e074db
MD5 981328afb09b594757aef4fc230c1f77
BLAKE2b-256 d9ffbbd72b51fe8eaeadc2ee974eeff0e912fa3e42f247c2820fde214f4806e8

See more details on using hashes here.

File details

Details for the file vocalid-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: vocalid-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for vocalid-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1d709e28a1a0c8c7a0538685718bc4135212fb0a594134f53bf5e943536df832
MD5 bdc0800b7a4b08d9256966abe25a46be
BLAKE2b-256 e5b83263d3b42d84993f1fa51d21ca18bbbe5fdd15256f0d17bb456f98cc0d95

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page