Skip to main content

Analyze and compare voices with deep learning

Project description

Resemblyzer allows you to derive a high-level representation of a voice through a deep learning model called the voice encoder. Given an audio file of speech, it creates a summary vector of 256 values (an embedding, often shortened to "embed" in this repo) that summarizes the characteristics of the voice spoken. Resemblyzer has many uses:

  • Voice similarity metric: compare different voices and get a value on how similar they sound. This leads to other applications:
    • Speaker verification: create a voice profile for a person from a few seconds of speech (5s - 30s) and compare it to that of new audio. Reject similarity scores below a threshold.
    • Speaker diarization: figure out who is talking when by comparing voice profiles with the continuous embedding of a speech segment.
    • Fake speech detection: verify if some speech is legitimate or fake by comparing the similarity of possible fake speech to real speech.
  • High-level feature extraction: you can use the embeddings generated as feature vectors for your machine learning models! This also leads to other applications:
    • Voice cloning: see this other project.
    • Component analysis: figure out accents, tones, prosody, gender, ... through component analysis of the embeddings.
    • Virtual voices: create entirely new voice embeddings by sampling from a prior distribution.
  • Loss function: you can backpropagate through the voice encoder model and use it as a perceptual loss for your deep learning model! The voice encoder is written in PyTorch.

Resemblyzer is fast to execute (around 1000x real-time on a GTX 1080, with a minimum of 10ms for I/O operations), and can run both on CPU or GPU. It is robust to noise. It currently works best on English language only, but should still be able to perform somewhat decently on other languages.

Examples

This is a short example showing how to use Resemblyzer:

from resemblyzer import VoiceEncoder, preprocess_wav
from pathlib import Path

fpath = Path("path_to_an_audio_file")
wav = preprocess_wav(fpath)

encoder = VoiceEncoder()
embed = encoder.embed_utterance(wav)
np.set_printoptions(precision=3, suppress=True)
print(embed)

More thorough examples demonstrating the use cases of Resemblyzer can be found in examples.py.

Additional info

Resemblyzer emerged as a side project of the Real-Time Voice Cloning repository. The pretrained model that comes with Resemblyzer is interchangeable with models trained in that repository, so feel free to finetune a model on new data and possibly new languages! The paper from which the voice encoder was implemented is Generalized End-To-End Loss for Speaker Verification (in which it is called the speaker encoder).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Resemblyzer-0.1.1.dev0.tar.gz (15.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

Resemblyzer-0.1.1.dev0-py3-none-any.whl (15.7 MB view details)

Uploaded Python 3

File details

Details for the file Resemblyzer-0.1.1.dev0.tar.gz.

File metadata

  • Download URL: Resemblyzer-0.1.1.dev0.tar.gz
  • Upload date:
  • Size: 15.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.2

File hashes

Hashes for Resemblyzer-0.1.1.dev0.tar.gz
Algorithm Hash digest
SHA256 68214e001aae34d45d5056105a65d519b6aefba9015d5b915dbcc2e8f0a34087
MD5 a56abaead5995915debac75e0c14c389
BLAKE2b-256 27dac7a28b3620505b5bb05ded792dfc1f56a41eb0639daa7aa659da2e57b502

See more details on using hashes here.

File details

Details for the file Resemblyzer-0.1.1.dev0-py3-none-any.whl.

File metadata

  • Download URL: Resemblyzer-0.1.1.dev0-py3-none-any.whl
  • Upload date:
  • Size: 15.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.2

File hashes

Hashes for Resemblyzer-0.1.1.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 d6590346269508bbecd21ace071b413ea2ce586e5da6d0e88d13a16732837e94
MD5 9c9e8c38452ae0370bc8a012afc94e0b
BLAKE2b-256 e021f0a22ee4afd9e5d9790b04329accdb71d2cf89ffaf5bb0611fb37cd91782

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page