This is a an unofficial fork of the https://github.com/skirdey/voicerestore repository for pip packaging.

These details have not been verified by PyPI

Project links

Project description

This is a an unofficial fork of the https://github.com/skirdey/voicerestore repository for pip packaging.

VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

VoiceRestore is a cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings. Leveraging flow-matching transformers, this model excels at addressing a wide range of audio imperfections commonly found in speech, including background noise, reverberation, distortion, and signal loss.

Demo of audio restorations: VoiceRestore

Credits: This repository is based on the E2-TTS implementation by Lucidrains

Super easy usage - using Transformers 🤗 by @jadechoghari - Hugging Face

Build it locally on gradio in this repo.

Latest Releases

01/16/2025 - Version 1.1 of the checkpoint that improves restoration.
09/07/2024 - Version 0.1 of the model inference and checkpoint.

Example

Degraded Input:

Degraded Input

Degraded audio (reverberation, distortion, noise, random cut):

Note: Adjust your volume before playing the degraded audio sample, as it may contain distortions.

https://github.com/user-attachments/assets/0c030274-60b5-41a4-abe6-59a3f1bc934b

Restored (steps=32, cfg=1.0):

Restored

Restored audio - 16 steps, strength 0.5:

https://github.com/user-attachments/assets/fdbbb988-9bd2-4750-bddd-32bd5153d254

Ground Truth:

Ground Truth

Key Features

Universal Restoration: The model can handle any level and type of voice recording degradation. Pure magic.
Easy to Use: Simple interface for processing degraded audio files.
Pretrained Model: Includes a 301 million parameter transformer model with pre-trained weights. (Model is still in the process of training, there will be further checkpoint updates)

Quick Start

Clone the repository:

git clone --recurse-submodules https://github.com/skirdey/voicerestore.git
cd VoiceRestore

if you did not clone with --recurse-submodules, you can run:

git submodule update --init --recursive

Install dependencies:
```
pip install -r requirements.txt
```
Download the pre-trained model and place it in the checkpoints folder. (Updated 9/29/2024)

Run a test restoration:

 python inference_short.py --checkpoint ./checkpoints/voicerestore-1.1.pth --input test_input.wav --output test_output.wav --steps 32 --cfg_strength 0.5

This will process test_input.wav and save the result as test_output.wav.

Run a long form restoration, it uses window chunking:

python inference_long.py --checkpoint ./checkpoints/voicerestore-1.1.pth --input long_audio_file.mp3 --output test_output_long.wav --steps 8 --cfg_strength 0.5 --window_size_sec 10.0 --overlap 0.3

This will save the result as test_output_long.wav.

Usage

To restore your own audio files:

from model import OptimizedAudioRestorationModel

model = OptimizedAudioRestorationModel()
restored_audio = model.forward(input_audio, steps=32, cfg_strength=0.5)

Alternative Usage - using Transformers 🤗

!git lfs install
!git clone https://huggingface.co/jadechoghari/VoiceRestore
%cd VoiceRestore
!pip install -r requirements.txt

from transformers import AutoModel
# path to the model folder (on colab it's as follows)
checkpoint_path = "/content/VoiceRestore"
model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True)
model("test_input.wav", "test_output.wav")

Model Details

Architecture: Flow-matching transformer
Parameters: 300M+ parameters
Input: Degraded speech audio (various formats supported)
Output: Restored speech

Limitations and Future Work

Current model is optimized for speech; may not perform optimally on music or other audio types.
Ongoing research to improve performance on extreme degradations.
Future updates may include real-time processing capabilities.

Citation

If you use VoiceRestore in your research, please cite our paper:

@misc{kirdey2025voicerestoreflowmatchingtransformersspeech,
      title={VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration}, 
      author={Stanislav Kirdey},
      year={2025},
      eprint={2501.00794},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2501.00794}, 
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Based on the E2-TTS implementation by Lucidrains
Special thanks to the open-source community for their invaluable contributions.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicerestore_fork-0.1.0.tar.gz (28.0 kB view details)

Uploaded Apr 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voicerestore_fork-0.1.0-py3-none-any.whl (36.5 kB view details)

Uploaded Apr 17, 2025 Python 3

File details

Details for the file voicerestore_fork-0.1.0.tar.gz.

File metadata

Download URL: voicerestore_fork-0.1.0.tar.gz
Upload date: Apr 17, 2025
Size: 28.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for voicerestore_fork-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a18d22d8ee124445beca0ef582f8c7ccc7e8a5b91551fdf1f9d7b82bf97a0916`
MD5	`26fe5163056f618693ffbcba8b57aeb2`
BLAKE2b-256	`b02fabb41f798e8726ed540eee4a9e058ba5c7db3c003cea603d84d9833d536b`

See more details on using hashes here.

File details

Details for the file voicerestore_fork-0.1.0-py3-none-any.whl.

File metadata

Download URL: voicerestore_fork-0.1.0-py3-none-any.whl
Upload date: Apr 17, 2025
Size: 36.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for voicerestore_fork-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`87fc131e558f7feb23699a3fed204f755d466c3228d26b6a6fbc4e1d8e109922`
MD5	`fed638b0169b6c3fe15b7e3658e509ef`
BLAKE2b-256	`75b6201ebc4aebeb9561c395d8fb40c8878fd43ff43ac68a8a077c0d0ed17d4a`

See more details on using hashes here.

voicerestore-fork 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

Super easy usage - using Transformers 🤗 by @jadechoghari - Hugging Face

Build it locally on gradio in this repo.

Latest Releases

Example

Degraded Input:

Restored (steps=32, cfg=1.0):

Ground Truth:

Key Features

Quick Start

Usage

Alternative Usage - using Transformers 🤗

Model Details

Limitations and Future Work

Citation

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes