Skip to main content

Time-domain pitch-synchronous overlap-add

Project description

Time-domain pitch-synchronous overlap-add (TD-PSOLA)

PyPI License: GPL v3

This module permits contant- and variable-rate pitch-shifting and time-stretching of speech. It is a wrapper around the parselmouth [1] wrapper around the Praat [2] implementation of TD-PSOLA [3]. Pitch-shifting is performed by providing a numpy array of target pitch values equally spaced over time. Variable-rate time stretching uses forced phoneme alignment via pypar.

If you need to extract pitch features or phoneme alignments, see torchcrepe for pitch estimation and pyfoal for forced alignment. If you only want to perform pitch-shifting, you do not need to extract forced alignments. If you want to do variable-rate time stretching, you do not need to perform pitch estimation.

Installation

pip install psola

Usage

If you want to perform pitch-shifting or time-stretching on audio already loaded into memory, use psola.vocode. If you want to do this with audio saved in a file, use psola.from_file. You can use psola.to_file or psola.from_file_to_file to save the results to a file. To process many files at once with multiprocessing, use psola.from_files_to_files. Each of these functions is documented below. The command-line interface wraps the arguments of psola.from_files_to_files and is described in the next section.

psola.vocode

"""Performs pitch vocoding using Praat

Arguments
    audio : np.array(shape=(samples,))
        The speech signal to process
    sample_rate : int
        The audio sampling rate.
    source_alignment : pypar.Alignment
        The current alignment if performing time-stretching
    target_alignment : pypar.Alignment
        The target alignment if performing time-stretching
    target_pitch : np.array(shape=(frames,))
        The target pitch contour
    constant_stretch : float or None
        A constant value for time-stretching
    fmin : int
        The minimum allowable frequency in Hz.
    fmax : int
        The maximum allowable frequency in Hz.

Returns
    audio : np.array(shape=(samples,))
        The vocoded audio
"""

psola.from_file

"""Performs vocoding using Praat

Arguments
    audio_file : string
        The file containing the speech signal to process
    source_alignment_file : string or None
        The file containing the original alignment
    target_alignment_file : string or None
        The file containing the target alignment
    target_pitch_file : string or None
        The file containing the target pitch
    constant_stretch : float or None
        A constant value for time-stretching
    fmin : int
        The minimum allowable frequency in Hz.
    fmax : int
        The maximum allowable frequency in Hz.

Returns
    audio : np.array(shape=(samples,))
        The vocoded audio
    sample_rate : int
        The audio sampling rate
"""

psola.to_file

"""Performs pitch vocoding and saves audio to disk

Arguments
    audio : np.array(shape=(samples,))
        The speech signal to process
    sample_rate : int
        The audio sampling rate
    output_file : string
        The file to save the vocoded speech
    source_alignment : pypar.Alignment
        The current alignment if performing time-stretching
    target_alignment : pypar.Alignment
        The target alignment if performing time-stretching
    target_pitch : np.array(shape=(frames,))
        The target pitch contour
    constant_stretch : float or None
        A constant value for time-stretching
    fmin : int
        The minimum allowable frequency in Hz.
    fmax : int
        The maximum allowable frequency in Hz.
"""

psola.from_file_to_file

"""Performs vocoding using Praat and save to disk

Arguments
    audio_file : string
        The file containing the speech signal to process
    output_file : string
        The file to save the vocoded speech
    source_alignment_file : string or None
        The file containing the original alignment
    target_alignment_file : string or None
        The file containing the target alignment
    target_pitch_file : string or None
        The file containing the target pitch
    constant_stretch : float or None
        A constant value for time-stretching
    fmin : int
        The minimum allowable frequency in Hz.
    fmax : int
        The maximum allowable frequency in Hz.
"""

psola.from_files_to_files

"""Performs vocoding using Praat and save to disk

Arguments
    audio_files : list
        The files containing the speech signals to process
    output_files : list
        The files to save the vocoded speech
    source_alignment_files : string or None
        The files containing the original alignments
    target_alignment_files : list or None
        The files containing the target alignments
    target_pitch_files : list or None
        The files containing the target pitch
    constant_stretch : float or None
        A constant value for time-stretching
    fmin : int
        The minimum allowable frequency in Hz.
    fmax : int
        The maximum allowable frequency in Hz.
"""

Command-line interface

usage: python -m psola
    [-h]
    [--audio_files AUDIO_FILES [AUDIO_FILES ...]]
    [--source_alignment_files SOURCE_ALIGNMENT_FILES [SOURCE_ALIGNMENT_FILES ...]]
    [--target_alignment_files TARGET_ALIGNMENT_FILES [TARGET_ALIGNMENT_FILES ...]]
    [--constant_stretch CONSTANT_STRETCH]
    [--target_pitch_files TARGET_PITCH_FILES [TARGET_PITCH_FILES ...]]
    [--fmin FMIN]
    [--fmax FMAX]
    [--output_files OUTPUT_FILES [OUTPUT_FILES ...]]

optional arguments:
  -h, --help            show this help message and exit
  --audio_files AUDIO_FILES [AUDIO_FILES ...]
                        The speech signal to process
  --source_alignment_files SOURCE_ALIGNMENT_FILES [SOURCE_ALIGNMENT_FILES ...]
                        The files containing the original alignments
  --target_alignment_files TARGET_ALIGNMENT_FILES [TARGET_ALIGNMENT_FILES ...]
                        The files containing the target alignments
  --constant_stretch CONSTANT_STRETCH
                        A constant value for time-stretching
  --target_pitch_files TARGET_PITCH_FILES [TARGET_PITCH_FILES ...]
                        The target pitch contour
  --fmin FMIN           The minimum allowable frequency in Hz
  --fmax FMAX           The maximum allowable frequency in Hz
  --output_files OUTPUT_FILES [OUTPUT_FILES ...]
                        Where to save the vocoded audio

References

[1] Y. Jadoul, B. Thompson, and B. De Boer, "Introducing parselmouth: A python interface to praat," Journal of Phonetics, vol. 71, pp. 1–15, 2018.

[2] P. Boersma, "Praat: doing phonetics by computer", http://www.praat.org/, 2006.

[3] E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones," Speech communication, 1990.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psola-0.0.1.tar.gz (7.1 kB view hashes)

Uploaded Source

Built Distribution

psola-0.0.1-py3-none-any.whl (19.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page