Skip to main content

A package designed to compose speaker verification systems

Project description

speaker-verification-toolkit

This module contains some tools to make a simple speaker verification.

You can download it with PyPI:

$ pip install speaker-verification-toolkit

To import and use in your own projects:

import speaker_verification_toolkit.tools as svt

#   svt.some_function(...)

Usage


find_nearest_voice_data(voice_data_list, voice_sample)

Find the nearest voice data based on this voice sample. Could be used to make the naive Accept/Reject decision.

voice_data_list: a list containing all voices data from the dataset.

voice_sample: the voice sample reference.

returns: the index of the element from voice_data_list that represents the nearest voice data.


compute_distance(sample1, sample3)

Compute the distance between sample1 and sample2 using O(n) DTW algorithm

sample1: the mfcc data extracted from the audio signal 1.

sample2: the mfcc data extracted from the audio signal 2.

returns: Float number representing the minimum distance between sample1 and sample2.


extract_mfcc(signal_data, samplerate=16000, winlen=0.025, winstep=0.01)

Compute MFCC features from an audio signal

signal: the audio signal from which to compute features. Should be an N*1 array.

samplerate: the sample rate of the signal we are working with, in Hz.

winlen: the length of the analysis window in seconds. Default is 0.025s (25 milliseconds).

winstep: the step between successive windows in seconds. Default is 0.01s (10 milliseconds).

returns: A numpy array of size (NUMFRAMES by numcep) containing features. Each row holds 1 feature vector.


extract_mfcc_from_wav_file(path, samplerate=16000, winlen=0.025, winstep=0.01)

Compute MFCC features from a wav file

path: the wav file path to be open.

samplerate: the wanted sample rate, in Hz. Default is 16000. If you want no resampling fill this argument with None.

winlen: the length of the analysis window in seconds. Default is 0.025s (25 milliseconds).

winstep: the step between successive windows in seconds. Default is 0.01s (10 milliseconds).

returns: A numpy array of size (NUMFRAMES by numcep) containing features. Each row holds 1 feature vector.


rms_silence_filter(data, samplerate=16000, segment_length=None, threshold=0.001135)

Cut off silence parts from the signal audio data. Doesn't work with signals data affected by environment noise. You would consider apply a noise filter before using this silence filter or make sure that environment noise is small enough to be considered as silence.

data: the audio signal data

samplerate: if no segment_length is given, segment_length will be equals samplerate/100 (around 0.01 secs per segment).

segment_length: the number of frames per segment. I.e. for a sample rate SR, a segment length equals SR/100 will represent a chunk containing 0.01 seconds of audio.

threshold: the threshold value. Values less than or equal values will be cut off. The default value was defined at [1] (see the references).

returns: the param "data" without silence parts.

References

[1] - Muhammad Asadullah & Shibli Nisar, "A SILENCE REMOVAL AND ENDPOINT DETECTION APPROACH FOR SPEECH PROCESSING", National University of Computer and Emerging Sciences, Peshawar

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speaker_verification_toolkit-0.0.2.tar.gz (3.4 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page