Skip to main content

FFT-based channel vocoder that applies spectral envelope transfer

Project description

FFT Channel Vocoder

Built by a blind programmer and musician, for musicians

A Python package that applies spectral envelope transfer using FFT-based processing. The vocoder takes a modulator (voice) signal and imposes its spectral envelope onto a carrier signal.

Designed with accessibility as a core principle: CLI first, no GUI, cross-platform, automated batch processing—built by a blind developer for everyone who deserves equal access to audio tools.

Documentation

  • 📚 Documentation — Installation, tutorials, configuration, and troubleshooting
  • 🎨 Design Philosophy — Why we built it this way, design decisions, and algorithm choices
  • ⚙️ Algorithm Deep Dive — Technical details for audio engineers and researchers

Features

  • Spectral Envelope Transfer: Extracts formant information from voice and applies it to carrier signals
  • Multiple Input Formats: Supports voice files, MIDI files (synthesized to carrier waves), pre-generated synth wave files, and scale-based pitch correction
  • Pitch Correction: Optional automatic pitch detection and correction to user-defined musical scales with noise gate
  • Batch Processing: Automatically processes multiple input files with consistent naming patterns
  • Generator-based Design: Uses Python generators for efficient iteration through numbered file sequences
  • Accessibility First: CLI interface, fully accessible to screen readers, works across platforms

Installation

From source

cd fft_channel_vocoder
pip install -e .

Usage

Command Line

Run the vocoder via the vocode command:

vocode

Or using Python module syntax:

python3 -m fft_channel_vocoder

Input Structure

The vocoder expects files organized in an input/ directory:

input/
├── voice1.wav          # Modulator signals
├── voice2.wav
├── melody1.mid         # MIDI files to synthesize as carrier
├── melody2.mid
├── synth1.wav          # Pre-generated synth wave files as carrier
├── synth2.wav
├── synth3.wav
├── scale1.txt          # Scale files for pitch correction (one note per line)
└── scale2.txt

Scale File Format: Each scale file contains one note class per line. Supported note classes are: c, c#, d, d#, e, f, f#, g, g#, a, a#, b. Comments (lines starting with #) and blank lines are ignored.

Example scale1.txt (C Major scale):

# Major scale
c
d
e
f
g
a
b

Processing Flow

For each voice file, the vocoder:

  1. MIDI Processing: Synthesizes each MIDI file into a carrier wave and vocodes with the voice
  2. Synth Wave Processing: Loads each pre-generated synth wave file and vocodes with the voice
  3. Pitch Correction: Detects pitch from the voice and snaps to a user-defined scale, synthesizes a carrier wave, and vocodes with the voice
  4. Whisper Generation: Creates a stereo whisper track by vocoding the voice with white noise

Pitch Correction Details:

  • Analyzes the voice for dominant frequencies in the range 50-2000 Hz
  • Snaps detected pitches to a defined musical scale (octave-independent)
  • Uses a noise gate (-40 dB by default) to prevent unwanted tuning during silence or low-amplitude content
  • Maintains the last detected note when below the noise gate threshold

Output Structure

Processed files are saved to output/:

output/
├── voice1_melody1.wav       # Voice + MIDI synthesis
├── voice1_melody2.wav
├── voice1_synth1.wav        # Voice + Synth wave 1
├── voice1_synth2.wav
├── voice1_synth3.wav
├── voice1_scale1.wav        # Voice + Pitch-corrected carrier (scale 1)
├── voice1_scale2.wav        # Voice + Pitch-corrected carrier (scale 2)
└── voice1_whisper.wav       # Stereo whisper track

Configuration

Edit fft_channel_vocoder/config.py to adjust:

  • sample_rate: Default 96,000 Hz
  • fft_size: FFT window power (2^12 = 4096 samples)

Algorithm

The vocoder works in 4 steps:

  1. STFT Analysis: Compute Short-Time Fourier Transform for both voice and carrier
  2. Spectral Smoothing: Apply Gaussian blur to extract formant envelopes
  3. Envelope Transfer: Apply spectral whitening to the carrier, then scale by voice envelope
  4. Reconstruction: Inverse STFT with original carrier phase to recover time-domain signal

Demonstration

Hear the FFT Vocoder in action:

FFT Vocoder Demo Video — Original voice, vocoded track, and vocoded track with music + code displayed on screen

Module Reference

  • main.py: Core pipeline and file iteration
  • fft.py: FFT vocoding algorithm
  • clean_io.py: Audio file I/O with resampling
  • clean_audio.py: Audio preprocessing and validation
  • config.py: Global configuration parameters
  • midi_synth.py: MIDI to audio synthesis
  • pitch_corrector.py: Pitch detection and scale-based note snapping
  • scale_synth.py: Pitch-corrected carrier synthesis
  • noise_generators.py: Noise generation utilities
  • buffers.py: Buffer management utilities

Disclaimer

This project was developed with AI-assisted development. While some parts of the code were built with AI assistance, the program ideas, architecture, and design philosophy are original.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fft_channel_vocoder-1.0.0.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fft_channel_vocoder-1.0.0-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file fft_channel_vocoder-1.0.0.tar.gz.

File metadata

  • Download URL: fft_channel_vocoder-1.0.0.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for fft_channel_vocoder-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d3ab63bdee8740ddff185a7845c39966de0613685f5ccd7ab9f42fc54f29cb67
MD5 895a8d9835af6b6b2f105de0650680b7
BLAKE2b-256 b58fd641076284e8418a689cd8120ea8c8fce662d86cac668369750fc42dfecc

See more details on using hashes here.

File details

Details for the file fft_channel_vocoder-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for fft_channel_vocoder-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c5a52fe80aef6626515193dea48b50374a0bf80790055251bfe56f9376bac6e
MD5 291dafc7b76dd3495f2519609dd5f066
BLAKE2b-256 577b0037ccd069c3a803619ac2d46c7574c0cee564128155f503a6fcad0cfbac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page