Time-domain pitch-synchronous overlap-add
Project description
Time-domain pitch-synchronous overlap-add (TD-PSOLA)
This module permits contant- and variable-rate pitch-shifting and
time-stretching of speech. It is a wrapper around the
parselmouth
[1]
wrapper around the Praat [2] implementation of TD-PSOLA [3]. Pitch-shifting
is performed by providing a numpy array of target pitch values equally spaced
over time. Variable-rate time stretching uses forced phoneme alignment via
pypar
.
If you need to extract pitch features or phoneme alignments, see
torchcrepe
for pitch estimation
and pyfoal
for forced alignment.
If you only want to perform pitch-shifting, you do not need to extract
forced alignments. If you want to do variable-rate time stretching, you do not
need to perform pitch estimation.
Installation
pip install psola
Usage
If you want to perform pitch-shifting or time-stretching on audio already
loaded into memory, use psola.vocode
. If you want to do this with audio
saved in a file, use psola.from_file
. You can use psola.to_file
or
psola.from_file_to_file
to save the results to a file. To process many
files at once with multiprocessing, use psola.from_files_to_files
.
Each of these functions is documented below. The command-line interface
wraps the arguments of psola.from_files_to_files
and is described in
the next section.
psola.vocode
"""Performs pitch vocoding using Praat
Arguments
audio : np.array(shape=(samples,))
The speech signal to process
sample_rate : int
The audio sampling rate.
source_alignment : pypar.Alignment
The current alignment if performing time-stretching
target_alignment : pypar.Alignment
The target alignment if performing time-stretching
target_pitch : np.array(shape=(frames,))
The target pitch contour
constant_stretch : float or None
A constant value for time-stretching
fmin : int
The minimum allowable frequency in Hz.
fmax : int
The maximum allowable frequency in Hz.
Returns
audio : np.array(shape=(samples,))
The vocoded audio
"""
psola.from_file
"""Performs vocoding using Praat
Arguments
audio_file : string
The file containing the speech signal to process
source_alignment_file : string or None
The file containing the original alignment
target_alignment_file : string or None
The file containing the target alignment
target_pitch_file : string or None
The file containing the target pitch
constant_stretch : float or None
A constant value for time-stretching
fmin : int
The minimum allowable frequency in Hz.
fmax : int
The maximum allowable frequency in Hz.
Returns
audio : np.array(shape=(samples,))
The vocoded audio
sample_rate : int
The audio sampling rate
"""
psola.to_file
"""Performs pitch vocoding and saves audio to disk
Arguments
audio : np.array(shape=(samples,))
The speech signal to process
sample_rate : int
The audio sampling rate
output_file : string
The file to save the vocoded speech
source_alignment : pypar.Alignment
The current alignment if performing time-stretching
target_alignment : pypar.Alignment
The target alignment if performing time-stretching
target_pitch : np.array(shape=(frames,))
The target pitch contour
constant_stretch : float or None
A constant value for time-stretching
fmin : int
The minimum allowable frequency in Hz.
fmax : int
The maximum allowable frequency in Hz.
"""
psola.from_file_to_file
"""Performs vocoding using Praat and save to disk
Arguments
audio_file : string
The file containing the speech signal to process
output_file : string
The file to save the vocoded speech
source_alignment_file : string or None
The file containing the original alignment
target_alignment_file : string or None
The file containing the target alignment
target_pitch_file : string or None
The file containing the target pitch
constant_stretch : float or None
A constant value for time-stretching
fmin : int
The minimum allowable frequency in Hz.
fmax : int
The maximum allowable frequency in Hz.
"""
psola.from_files_to_files
"""Performs vocoding using Praat and save to disk
Arguments
audio_files : list
The files containing the speech signals to process
output_files : list
The files to save the vocoded speech
source_alignment_files : string or None
The files containing the original alignments
target_alignment_files : list or None
The files containing the target alignments
target_pitch_files : list or None
The files containing the target pitch
constant_stretch : float or None
A constant value for time-stretching
fmin : int
The minimum allowable frequency in Hz.
fmax : int
The maximum allowable frequency in Hz.
"""
Command-line interface
usage: python -m psola
[-h]
[--audio_files AUDIO_FILES [AUDIO_FILES ...]]
[--source_alignment_files SOURCE_ALIGNMENT_FILES [SOURCE_ALIGNMENT_FILES ...]]
[--target_alignment_files TARGET_ALIGNMENT_FILES [TARGET_ALIGNMENT_FILES ...]]
[--constant_stretch CONSTANT_STRETCH]
[--target_pitch_files TARGET_PITCH_FILES [TARGET_PITCH_FILES ...]]
[--fmin FMIN]
[--fmax FMAX]
[--output_files OUTPUT_FILES [OUTPUT_FILES ...]]
optional arguments:
-h, --help show this help message and exit
--audio_files AUDIO_FILES [AUDIO_FILES ...]
The speech signal to process
--source_alignment_files SOURCE_ALIGNMENT_FILES [SOURCE_ALIGNMENT_FILES ...]
The files containing the original alignments
--target_alignment_files TARGET_ALIGNMENT_FILES [TARGET_ALIGNMENT_FILES ...]
The files containing the target alignments
--constant_stretch CONSTANT_STRETCH
A constant value for time-stretching
--target_pitch_files TARGET_PITCH_FILES [TARGET_PITCH_FILES ...]
The target pitch contour
--fmin FMIN The minimum allowable frequency in Hz
--fmax FMAX The maximum allowable frequency in Hz
--output_files OUTPUT_FILES [OUTPUT_FILES ...]
Where to save the vocoded audio
References
[1] Y. Jadoul, B. Thompson, and B. De Boer, "Introducing parselmouth: A python interface to praat," Journal of Phonetics, vol. 71, pp. 1–15, 2018.
[2] P. Boersma, "Praat: doing phonetics by computer", http://www.praat.org/, 2006.
[3] E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones," Speech communication, 1990.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file psola-0.0.1.tar.gz
.
File metadata
- Download URL: psola-0.0.1.tar.gz
- Upload date:
- Size: 7.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a09edebc0dc1bcaff1cfd6fa98b8f294ae2a0732d71befa8a27f2cb094081d1f |
|
MD5 | 7712a0c093966c7094a939a6f778516a |
|
BLAKE2b-256 | 6b80595fe228e5d785328567ad9a4354260c21f27465c032b8ede6f665c3fd1a |
File details
Details for the file psola-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: psola-0.0.1-py3-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34874c442398640b01fbaffe385959f3a1c30e94dbde8a138dd3dfd36858a5e3 |
|
MD5 | 2f7eaf877be31494912d649cf354bc1c |
|
BLAKE2b-256 | 42ca7bb639f4c51acdcb94795eb80bde6b7a32697724b706841bbabb36a94861 |