Skip to main content

Voice restoration using BigVGAN

Project description

VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

VoiceRestore is a cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings. Leveraging flow-matching transformers, this model excels at addressing a wide range of audio imperfections commonly found in speech, including background noise, reverberation, distortion, and signal loss.

Demo of audio restorations: VoiceRestore

Credits: This repository is based on the E2-TTS implementation by Lucidrains

Super easy usage - using Transformers 🤗 by @jadechoghari - Hugging Face

VoiceRestore

Build it locally on gradio in this repo.

Latest Releases

Example

Degraded Input:

Degraded Input

Degraded audio (reverberation, distortion, noise, random cut):

Note: Adjust your volume before playing the degraded audio sample, as it may contain distortions.

https://github.com/user-attachments/assets/0c030274-60b5-41a4-abe6-59a3f1bc934b


Restored (steps=32, cfg=1.0):

Restored

Restored audio - 16 steps, strength 0.5:

https://github.com/user-attachments/assets/fdbbb988-9bd2-4750-bddd-32bd5153d254


Ground Truth:

Ground Truth


Key Features

  • Universal Restoration: The model can handle any level and type of voice recording degradation. Pure magic.
  • Easy to Use: Simple interface for processing degraded audio files.
  • Pretrained Model: Includes a 301 million parameter transformer model with pre-trained weights. (Model is still in the process of training, there will be further checkpoint updates)

Quick Start

  1. Clone the repository:

    git clone --recurse-submodules https://github.com/skirdey/voicerestore.git
    cd VoiceRestore
    

    if you did not clone with --recurse-submodules, you can run:

    git submodule update --init --recursive
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download the pre-trained model and place it in the checkpoints folder. (Updated 9/29/2024)

  4. Run a test restoration:

     python inference_short.py --checkpoint ./checkpoints/voicerestore-1.1.pth --input test_input.wav --output test_output.wav --steps 32 --cfg_strength 0.5
    

    This will process test_input.wav and save the result as test_output.wav.

  5. Run a long form restoration, it uses window chunking:

    python inference_long.py --checkpoint ./checkpoints/voicerestore-1.1.pth --input long_audio_file.mp3 --output test_output_long.wav --steps 8 --cfg_strength 0.5 --window_size_sec 10.0 --overlap 0.3
    

    This will save the result as test_output_long.wav.

Usage

To restore your own audio files:

from model import OptimizedAudioRestorationModel

model = OptimizedAudioRestorationModel()
restored_audio = model.forward(input_audio, steps=32, cfg_strength=0.5)

Alternative Usage - using Transformers 🤗

!git lfs install
!git clone https://huggingface.co/jadechoghari/VoiceRestore
%cd VoiceRestore
!pip install -r requirements.txt
from transformers import AutoModel
# path to the model folder (on colab it's as follows)
checkpoint_path = "/content/VoiceRestore"
model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True)
model("test_input.wav", "test_output.wav")

Model Details

  • Architecture: Flow-matching transformer
  • Parameters: 300M+ parameters
  • Input: Degraded speech audio (various formats supported)
  • Output: Restored speech

Limitations and Future Work

  • Current model is optimized for speech; may not perform optimally on music or other audio types.
  • Ongoing research to improve performance on extreme degradations.
  • Future updates may include real-time processing capabilities.

Citation

If you use VoiceRestore in your research, please cite our paper:

@misc{kirdey2025voicerestoreflowmatchingtransformersspeech,
      title={VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration}, 
      author={Stanislav Kirdey},
      year={2025},
      eprint={2501.00794},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2501.00794}, 
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicerestore-0.1.1.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicerestore-0.1.1-py3-none-any.whl (35.6 kB view details)

Uploaded Python 3

File details

Details for the file voicerestore-0.1.1.tar.gz.

File metadata

  • Download URL: voicerestore-0.1.1.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for voicerestore-0.1.1.tar.gz
Algorithm Hash digest
SHA256 559cf45c8f675d498817ef9a91236a1568dd54a3c0b77b3da4a9164b9bb1099a
MD5 f78b256a9430784216b525c2d0f6b633
BLAKE2b-256 1519266a9a1999d75f7e73cd4812132e08e8342dabd4eee05483ee175028a2bf

See more details on using hashes here.

File details

Details for the file voicerestore-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: voicerestore-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 35.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for voicerestore-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7910f32ad88d84b211a6842547d016b1a6c6df3e09133be51212deb93f21af5f
MD5 d35686a70733f517d2beaf3c5045d8e8
BLAKE2b-256 de7950251a35038b27456db87d70a857ab61e3f45a773d6c08455ffea685717b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page