Skip to main content

STFT based multi pitch shifting with optional formant preservation in C++ and Python.

Project description

stftPitchShift

This is a reimplementation of the Stephan M. Bernsee smbPitchShift.cpp, a pitch shifting algorithm using the Short-Time Fourier Transform (STFT).

This repository features two analogical algorithm implementations, C++ and Python. Both contain several function blocks of the same name (but different file extension, of course).

In addition to the base algorithm implementation, it also features spectral multi pitch shifting and cepstral formant preservation extensions.

Both sources contain a ready-to-use command line tool as well as a library for custom needs. See more details in the build section.

Modules

Vocoder

The Vocoder module transforms the DFT spectral data according to the original algorithm, which is actually the instantaneous frequency estimation technique. See also further reading for more details.

The particular encode function replaces the input DFT values by the magnitude + j * frequency complex numbers, representing the phase error based frequency estimation in the imaginary part.

The decode function does an inverse transformation back to the original DFT complex numbers, by replacing eventually modified frequency value by the reconstructed phase value.

Pitcher

The Pitcher module performs single or multi pitch shifting of the encoded DFT frame depending on the specified fractional factors.

Resampler

The Resampler module provides linear and bilinear interpolation routines, to actually perform pitch shifting, based on the Vocoder DFT transform.

Cepstrum

The Cepstrum module estimates a spectral envelope of the DFT magnitude vector, representing the vocal tract resonances. This computation takes place in the cepstral domain by applying a low-pass filter. The cutoff value of the low-pass filter or lifter is the quefrency value to be specified in seconds or milliseconds.

STFT

As the name of this module already implies, it performs the comprehensive STFT analysis and synthesis steps.

IO

The IO module provides a simple possibility to read and write .wav audio files.

Currently only mono .wav files are supported. Please use e.g. Audacity or SoX to prepare your audio files for pitch shifting.

Pitch shifting

Single pitch

Since the Vocoder module transforms the original DFT complex values real + j * imag into magnitude + j * frequency representation, the single pitch shifting is a comparatively easy task. Both magnitude and frequency vectors are to be resampled according to the desired pitch shifting factor:

  • The factor 1 means no change.
  • The factor <1 means downsampling.
  • The factor >1 means upsampling.

Any fractional resampling factor such as 0.5 requires interpolation. In the simplest case, linear interpolation will be sufficient. Otherwise, bilinear interpolation can also be applied to smooth values between two consecutive STFT hops.

Due to frequency vector alteration, the resampled frequency values needs also be multiplied by the resampling factor.

Multi pitch

In terms of multi pitch shifting, multiple differently resampled magnitude and frequency vectors are to be combined together. For example, the magnitude vectors can easily be averaged. But what about the frequency vectors?

The basic concept of this algorithm extension is to only keep the frequency value of the strongest magnitude value. Since the strongest magnitude will mask the weakest one. Thus, all remaining masked frequency values would be inaudible and can therefore be omitted.

In this way, the multi pitch shifting can be performed simultaneously in the same DFT frame. There is no need to build a separate STFT pipeline for different pitch variations to superimpose the synthesized signals in the time domain.

Formant preservation

Will soon appear...

Build

C++

Use CMake to build the C++ program and library like so:

mkdir build
cd build
cmake ..
cmake --build .

To include this library in your C++ audio project, check the LibStftPitchShift.cmake file and the following minimal example:

#include <StftPitchShift/StftPitchShift.h>

StftPitchShift pitchshifter(1024, 256, 44100);

std::vector<float> x(44100);
std::vector<float> y(x.size());

pitchshifter.shiftpitch(x, y, 1);

Python

The Python program stftpitchshift can be installed via pip install stftpitchshift.

Also feel free to explore the installed Python module StftPitchShift in your personal audio project:

from StftPitchShift import StftPitchShift

pitchshifter = StftPitchShift(1024, 256, 44100)

x = [0] * 44100
y = pitchshifter.shiftpitch(x, 1)

Usage

Both programs C++ and Python provides a similar set of command line arguments:

-h  --help     print this help

-i  --input    input .wav file name
-o  --output   output .wav file name

-p  --pitch    fractional pitch shifting factors separated by comma
               (default 1.0)

-f  --formant  optional formant lifter quefrency in milliseconds
               (default 0.0)

-w  --window   sfft window size
               (default 1024)

-v  --overlap  stft window overlap
               (default 32)

-d  --debug    plot spectrograms before and after processing
               (only available in the Python version)

    --smb      enable original smb algorithm
               (only available in the C++ version)

To apply multiple pitch shifts at once, separate each factor by a comma, e.g. -p 0.5,1,2.

Further reading

Instantaneous frequency estimation

Cepstrum analysis and formant changing

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stftpitchshift-1.0.tar.gz (12.4 kB view hashes)

Uploaded Source

Built Distribution

stftpitchshift-1.0-py3-none-any.whl (11.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page