Skip to main content

fat_llama_fftw is a Python package for upscaling audio files to FLAC or WAV formats using advanced audio processing techniques. It utilizes cpu-accelerated calculations to enhance audio quality by upsampling and adding missing frequencies through FFT (Fast Fourier Transform), resulting in richer and more detailed audio.

Project description

Fat Llama Logo

Fat Llama build - status PyPI PyPI - Downloads

fat_llama is a Python package for upscaling audio files to FLAC or WAV formats using advanced audio processing techniques. It utilizes cpu-accelerated calculations to enhance audio quality by upsampling and adding missing frequencies through FFT (Fast Fourier Transform), resulting in richer and more detailed audio.

Features

  • Upscale MP3 files to high-quality FLAC format.
  • Iterative soft thresholding (IST) for enhanced audio processing.
  • Auto-scaling amplitude adjustment and normalization.
  • Multi-Threaded processing on cpu.

Installation

Install via pip:

pip install fat-llama-fftw

(Note: For CUDA verison please look at https://pypi.org/project/fat-llama/)

Also, requires ffmpeg: https://support.audacityteam.org/basics/installing-ffmpeg

Usage

Example Usage

You can run the example provided in example.py:

from fat_llama_fftw.audio_fattener.feed import upscale

# Example call to the method
upscale(
    input_file_path='input_test.mp3',
    output_file_path='output_test.flac',
    source_format='mp3',
    target_format='flac',
    max_iterations=1000,
    threshold_value=0.6,
    target_bitrate_kbps=1400
)

Function Parameters

  • input_file_path (str): Path to the input audio file. Mandatory.
  • output_file_path (str): Path to the output processed audio file. Mandatory.
  • source_format (str): Format of the input audio file (e.g., 'mp3', 'wav', 'ogg', 'flac').
  • target_format (str): Format of the output audio file (e.g., 'flac', 'wav'). Default is 'flac'.
  • max_iterations (int): Maximum number of iterations for IST. Default is 800.
  • threshold_value (float): Threshold value for IST. Default is 0.6.
  • target_bitrate_kbps (int): Target bitrate in kbps. Default is 1411.

Running the Example

To run the example, execute the following command:

python example.py

This will upscale the MP3 file specified in the example and produce a FLAC file with full processing.

Spectrogram Results

Spectrogram Results

How it works

How it Works

Algorithm Explanation

The upscaling process involves several steps:

  1. Reading Audio File: The audio file is read, and the audio samples are extracted along with the sample rate and bitrate.
  2. Calculating Upscale Factor: The upscale factor is calculated to achieve the target bitrate.
  3. Upscaling Channels: The audio channels are upscaled using an interpolation algorithm. Each sample is repeated multiple times to increase the resolution.
  4. Iterative Soft Thresholding (IST): IST is applied to enhance the audio by adding missing frequencies. This process uses FFT to transform the signal into the frequency domain, apply a threshold to keep significant frequencies, and then inverse transform back to the time domain.
  5. Scaling Amplitude: The amplitude of the upscaled audio is scaled to match the original.
  6. Normalizing Audio: The audio is normalized to the range -1 to 1.
  7. Writing FLAC File: The processed audio is written to a FLAC file.

Why FFT and IST?

FFT (Fast Fourier Transform) is used to transform the audio signal into the frequency domain. This allows for the identification and manipulation of specific frequency components. By applying a threshold in the frequency domain, we can keep significant frequencies and discard noise and add it to our upscaling data to add detail to upscaling frequencies.

The report titled "Fast Sparse Fourier Transformations for NMR Spectroscopy" by Badruddin Kamal, supervised by Thomas Huber and Alastair Rendall, 2015, provides a comprehensive understanding of sparse representations and their applications in signal processing. IST leverages the concepts from this report to add missing frequencies and enhance the audio quality by making it more detailed and rich. This is particularly useful in upscaling audio where some frequencies might be missing or congested.

Test Audio Source

ericzo - beyond link(https://soundcloud.com/ericzomusic/free-electro-trap-anthem-beyond)

Changelog

All notable changes to this project will be documented in this file.

[1.0.3] - [1.0.4] - 2024-07-26

Changed

  • Moved to pyfftw from CUDA.

[1.0.2] - 2024-07-26

Changed

  • Remove logging from requirements to fix pip bug.

[1.0.1] - 2024-07-26

Changed

  • Updated analytics.py analysis and spectorgram results.
  • Updated README.md details.

[1.0.0] - 2024-07-25

Added

  • Added support for reading 'ogg', 'flac', and 'wav' file formats and calculating their bitrates correctly.

Changed

  • Renamed upscale_mp3_to_flac method to upscale to support multiple source formats.
  • Simplified the workflow to focus on 'mp3' to 'flac' conversion with essential steps only.

Removed

  • Dropped support for 'ape' and 'alac' target formats.

[0.1.8] - 2024-07-24

Added

  • Introduced toggle flags for normalization, equalization, amplitude scaling, and gain reduction.
  • Enhanced auto-scaling of amplitude based on the original MP3 file when toggle_scale_amplitude is False.
  • Logging for each step of the processing to provide better traceability and debugging.

Changed

  • Default values for parameters are now set at the function call.
  • Refined the upscaling algorithm to ensure better handling of amplitude and gain.
  • Renamed the flags for consistency (toggle_wiener_filter, toggle_normalize, toggle_equalize, toggle_scale_amplitude, toggle_gain_reduction).

Fixed

  • Fixed issues related to numpy and cupy array conversions.
  • Improved error handling for invalid target bitrate values.
  • Addressed the issue where the amplitude of the produced signal was significantly weaker than the original.

[0.1.7] - 2024-07-22

Added

  • Added methods for MP3 to FLAC conversion with optional processing using CuPy for GPU acceleration.
  • Initial version of upscale_mp3_to_flac method with parameters for iterative soft thresholding (IST), gain reduction, and equalization.

[0.1.0] to [0.1.6] - 2024-07-20

Added

  • Basic functionality for reading MP3 files and writing FLAC files.
  • Initial implementation of the new interpolation algorithm and IST for audio processing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fat_llama_fftw-1.0.4.4.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

fat_llama_fftw-1.0.4.4-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file fat_llama_fftw-1.0.4.4.tar.gz.

File metadata

  • Download URL: fat_llama_fftw-1.0.4.4.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for fat_llama_fftw-1.0.4.4.tar.gz
Algorithm Hash digest
SHA256 aa498a4884de55cc7dc717eaf4e6525bdf8c76be803c2cd515aeeee318f49f0e
MD5 c5b7b0cddc88aab5291558aae8f9c843
BLAKE2b-256 cb36eec9d092fa1f3c1eb40f09b2a700a4657eadd5f5694c47c415a35f56f433

See more details on using hashes here.

File details

Details for the file fat_llama_fftw-1.0.4.4-py3-none-any.whl.

File metadata

File hashes

Hashes for fat_llama_fftw-1.0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 4e81dc6e87a1852491531bf4035d1aa3b7ac8d61b92eb0b07c3b09b3948b2330
MD5 9529077bea8cb76f9ba4c64b8dfc7af6
BLAKE2b-256 d69ad81d4c7677409fce61e49cfad4025be0ef94ac6e6eaff61887ba714a6bb6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page