Python code for analyzing and synthesizing articulatory features of speech
Project description
Speech Articulatory Coding (SPARC)
Paper | Audio Samples | Colab Demo
This is the official code base for Coding Speech through Vocal Tract Kinematics.
Installation
git clone https://github.com/cheoljun95/Speech-Articulatory-Coding.git
cd Speech-Articulatory-Coding
pip install -e .
Usage
Load Model
from sparc import load_model
coder = load_model("en", device= "cpu") # For using CPU
coder = load_model("en", device= "cuda:0") # For using GPU
For pitch tracker, we found PENN is fast at inference. You can activate that with use_penn=True. The default is using torchcrepe.
coder = load_model("en", device= "cpu", use_penn=True) # Use PENN for pitch tracker
For inversion only, you can use the following,
coder = load_model("feature_extraction")
coder_from_config = load_model(config="configs/feature_extraction.yaml")
The following model checkpoints are offered. You can replace en with other models (multi or en+) in load_model.
| Model | Language | Training Dataset |
|---|---|---|
| en | English | LibriTTS-R |
| multi | Multi | LibriTTS-R, Multilignual LibriSpeech, AISHELL, JVS, KSS |
| en+ | English | LibriTTS-R, LibriTTS, EXPRESSO |
Articulatory Analysis
code = coder.encode(WAV_FILE) # Single inference
codes = coder.encode([WAV_FILE1, WAV_FILE2, ...]) # Batched processing
The articulatory code outputs have the following format.
# All features are in 50 Hz except speaker encoding
{"ema": (L, 12) array, #'TDX','TDY','TBX','TBY','TTX','TTY','LIX','LIY','ULX','ULY','LLX','LLY'
"loudness": (L, 1) array,
"pitch": (L, 1) array,
"periodicity": (L, 1) array, # auxiliary output of pitch tracker
"pitch_stats": (pitch mean, pitch std),
"spk_emb": (spk_emb_dim,) array, # all shared models use spk_emb_dim=64
"ft_len": Length of features, # usefull when batched processing with padding
}
Articulatory Synthesis
wav = coder.decode(**code)
sr = coder.sr
Voice Conversion
wav = coder.convert(SOURCE_WAV_FILE, TARGET_WAV_FILE)
sr = coder.sr
Demo
Please check notebooks/demo.ipynb for a demonstration of the functions.
Training
Feature extraction
Check scripts/encode_audio.py and an example script for extracting LibriTTS, scripts/extract_libritts.sh
TODO
- Add training codes.
- Add pypi installation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speech_articulatory_coding-0.1.0.tar.gz.
File metadata
- Download URL: speech_articulatory_coding-0.1.0.tar.gz
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a91ee0856b6fb73c3d6592dea1c3663cc1dd77786e54a0e83f71bdb23c8580d
|
|
| MD5 |
9637959853f909a4ce7c90e943d2350c
|
|
| BLAKE2b-256 |
97ef3a6bbdb743dd5675540c3369bbc5956283a4ca471593924ef52d20b5dd66
|
File details
Details for the file speech_articulatory_coding-0.1.0-py3-none-any.whl.
File metadata
- Download URL: speech_articulatory_coding-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c012bb629bce624994bef7d2ed0ef9c16709fe55fbf80f85909038b6880b05b
|
|
| MD5 |
d455b295c35faee572290fc40e2029e8
|
|
| BLAKE2b-256 |
5dab4cfca69c1f282a697a8d86c9c96556074dbcffc31d05e99894a5f02960f0
|