Skip to main content

This is a an unofficial fork of the https://github.com/skirdey/voicerestore repository for pip packaging.

Project description

This is a an unofficial fork of the https://github.com/skirdey/voicerestore repository for pip packaging.

VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

VoiceRestore is a cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings. Leveraging flow-matching transformers, this model excels at addressing a wide range of audio imperfections commonly found in speech, including background noise, reverberation, distortion, and signal loss.

Demo of audio restorations: VoiceRestore

Credits: This repository is based on the E2-TTS implementation by Lucidrains

Super easy usage - using Transformers 🤗 by @jadechoghari - Hugging Face

VoiceRestore

Build it locally on gradio in this repo.

Latest Releases

Example

Degraded Input:

Degraded Input

Degraded audio (reverberation, distortion, noise, random cut):

Note: Adjust your volume before playing the degraded audio sample, as it may contain distortions.

https://github.com/user-attachments/assets/0c030274-60b5-41a4-abe6-59a3f1bc934b


Restored (steps=32, cfg=1.0):

Restored

Restored audio - 16 steps, strength 0.5:

https://github.com/user-attachments/assets/fdbbb988-9bd2-4750-bddd-32bd5153d254


Ground Truth:

Ground Truth


Key Features

  • Universal Restoration: The model can handle any level and type of voice recording degradation. Pure magic.
  • Easy to Use: Simple interface for processing degraded audio files.
  • Pretrained Model: Includes a 301 million parameter transformer model with pre-trained weights. (Model is still in the process of training, there will be further checkpoint updates)

Quick Start

  1. Clone the repository:

    git clone --recurse-submodules https://github.com/skirdey/voicerestore.git
    cd VoiceRestore
    

    if you did not clone with --recurse-submodules, you can run:

    git submodule update --init --recursive
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download the pre-trained model and place it in the checkpoints folder. (Updated 9/29/2024)

  4. Run a test restoration:

     python inference_short.py --checkpoint ./checkpoints/voicerestore-1.1.pth --input test_input.wav --output test_output.wav --steps 32 --cfg_strength 0.5
    

    This will process test_input.wav and save the result as test_output.wav.

  5. Run a long form restoration, it uses window chunking:

    python inference_long.py --checkpoint ./checkpoints/voicerestore-1.1.pth --input long_audio_file.mp3 --output test_output_long.wav --steps 8 --cfg_strength 0.5 --window_size_sec 10.0 --overlap 0.3
    

    This will save the result as test_output_long.wav.

Usage

To restore your own audio files:

from model import OptimizedAudioRestorationModel

model = OptimizedAudioRestorationModel()
restored_audio = model.forward(input_audio, steps=32, cfg_strength=0.5)

Alternative Usage - using Transformers 🤗

!git lfs install
!git clone https://huggingface.co/jadechoghari/VoiceRestore
%cd VoiceRestore
!pip install -r requirements.txt
from transformers import AutoModel
# path to the model folder (on colab it's as follows)
checkpoint_path = "/content/VoiceRestore"
model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True)
model("test_input.wav", "test_output.wav")

Model Details

  • Architecture: Flow-matching transformer
  • Parameters: 300M+ parameters
  • Input: Degraded speech audio (various formats supported)
  • Output: Restored speech

Limitations and Future Work

  • Current model is optimized for speech; may not perform optimally on music or other audio types.
  • Ongoing research to improve performance on extreme degradations.
  • Future updates may include real-time processing capabilities.

Citation

If you use VoiceRestore in your research, please cite our paper:

@misc{kirdey2025voicerestoreflowmatchingtransformersspeech,
      title={VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration}, 
      author={Stanislav Kirdey},
      year={2025},
      eprint={2501.00794},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2501.00794}, 
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicerestore_fork-0.1.0.tar.gz (28.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicerestore_fork-0.1.0-py3-none-any.whl (36.5 kB view details)

Uploaded Python 3

File details

Details for the file voicerestore_fork-0.1.0.tar.gz.

File metadata

  • Download URL: voicerestore_fork-0.1.0.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for voicerestore_fork-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a18d22d8ee124445beca0ef582f8c7ccc7e8a5b91551fdf1f9d7b82bf97a0916
MD5 26fe5163056f618693ffbcba8b57aeb2
BLAKE2b-256 b02fabb41f798e8726ed540eee4a9e058ba5c7db3c003cea603d84d9833d536b

See more details on using hashes here.

File details

Details for the file voicerestore_fork-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for voicerestore_fork-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 87fc131e558f7feb23699a3fed204f755d466c3228d26b6a6fbc4e1d8e109922
MD5 fed638b0169b6c3fe15b7e3658e509ef
BLAKE2b-256 75b6201ebc4aebeb9561c395d8fb40c8878fd43ff43ac68a8a077c0d0ed17d4a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page