Skip to main content

Python code for analyzing and synthesizing articulatory features of speech

Project description

Speech Articulatory Coding (SPARC)

Paper | Audio Samples | Colab Demo

drawing

This is the official code base for Coding Speech through Vocal Tract Kinematics.

Installation

git clone https://github.com/cheoljun95/Speech-Articulatory-Coding.git
cd Speech-Articulatory-Coding
pip install -e .

Usage

Load Model

from sparc import load_model
coder = load_model("en", device= "cpu")     # For using CPU
coder = load_model("en", device= "cuda:0")  # For using GPU

For pitch tracker, we found PENN is fast at inference. You can activate that with use_penn=True. The default is using torchcrepe.

coder = load_model("en", device= "cpu", use_penn=True)    # Use PENN for pitch tracker

For inversion only, you can use the following,

coder = load_model("feature_extraction") 
coder_from_config = load_model(config="configs/feature_extraction.yaml")

The following model checkpoints are offered. You can replace en with other models (multi or en+) in load_model.

Model Language Training Dataset
en English LibriTTS-R
multi Multi LibriTTS-R, Multilignual LibriSpeech, AISHELL, JVS, KSS
en+ English LibriTTS-R, LibriTTS, EXPRESSO

Articulatory Analysis

code = coder.encode(WAV_FILE)          # Single inference
codes = coder.encode([WAV_FILE1, WAV_FILE2, ...]) # Batched processing

The articulatory code outputs have the following format.

# All features are in 50 Hz except speaker encoding
{"ema": (L, 12) array, #'TDX','TDY','TBX','TBY','TTX','TTY','LIX','LIY','ULX','ULY','LLX','LLY'
 "loudness": (L, 1) array, 
 "pitch": (L, 1) array, 
 "periodicity": (L, 1) array, # auxiliary output of pitch tracker
 "pitch_stats": (pitch mean, pitch std),
 "spk_emb": (spk_emb_dim,) array, # all shared models use spk_emb_dim=64
 "ft_len": Length of features, # usefull when batched processing with padding
}

Articulatory Synthesis

wav = coder.decode(**code)
sr = coder.sr

Voice Conversion

wav = coder.convert(SOURCE_WAV_FILE, TARGET_WAV_FILE)
sr = coder.sr

Demo

Please check notebooks/demo.ipynb for a demonstration of the functions.

Training

Feature extraction

Check scripts/encode_audio.py and an example script for extracting LibriTTS, scripts/extract_libritts.sh

TODO

  • Add training codes.
  • Add pypi installation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_articulatory_coding-0.1.0.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speech_articulatory_coding-0.1.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file speech_articulatory_coding-0.1.0.tar.gz.

File metadata

File hashes

Hashes for speech_articulatory_coding-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6a91ee0856b6fb73c3d6592dea1c3663cc1dd77786e54a0e83f71bdb23c8580d
MD5 9637959853f909a4ce7c90e943d2350c
BLAKE2b-256 97ef3a6bbdb743dd5675540c3369bbc5956283a4ca471593924ef52d20b5dd66

See more details on using hashes here.

File details

Details for the file speech_articulatory_coding-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for speech_articulatory_coding-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6c012bb629bce624994bef7d2ed0ef9c16709fe55fbf80f85909038b6880b05b
MD5 d455b295c35faee572290fc40e2029e8
BLAKE2b-256 5dab4cfca69c1f282a697a8d86c9c96556074dbcffc31d05e99894a5f02960f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page