This is a an unofficial fork of the https://github.com/skirdey/voicerestore repository for pip packaging.
Project description
This is a an unofficial fork of the https://github.com/skirdey/voicerestore repository for pip packaging.
VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration
VoiceRestore is a cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings. Leveraging flow-matching transformers, this model excels at addressing a wide range of audio imperfections commonly found in speech, including background noise, reverberation, distortion, and signal loss.
Demo of audio restorations: VoiceRestore
Credits: This repository is based on the E2-TTS implementation by Lucidrains
Super easy usage - using Transformers 🤗 by @jadechoghari - Hugging Face
Build it locally on gradio in this repo.
Latest Releases
- 01/16/2025 - Version 1.1 of the checkpoint that improves restoration.
- 09/07/2024 - Version 0.1 of the model inference and checkpoint.
Example
Degraded Input:
Degraded audio (reverberation, distortion, noise, random cut):
Note: Adjust your volume before playing the degraded audio sample, as it may contain distortions.
https://github.com/user-attachments/assets/0c030274-60b5-41a4-abe6-59a3f1bc934b
Restored (steps=32, cfg=1.0):
Restored audio - 16 steps, strength 0.5:
https://github.com/user-attachments/assets/fdbbb988-9bd2-4750-bddd-32bd5153d254
Ground Truth:
Key Features
- Universal Restoration: The model can handle any level and type of voice recording degradation. Pure magic.
- Easy to Use: Simple interface for processing degraded audio files.
- Pretrained Model: Includes a 301 million parameter transformer model with pre-trained weights. (Model is still in the process of training, there will be further checkpoint updates)
Quick Start
-
Clone the repository:
git clone --recurse-submodules https://github.com/skirdey/voicerestore.git cd VoiceRestore
if you did not clone with
--recurse-submodules, you can run:git submodule update --init --recursive
-
Install dependencies:
pip install -r requirements.txt
-
Download the pre-trained model and place it in the
checkpointsfolder. (Updated 9/29/2024) -
Run a test restoration:
python inference_short.py --checkpoint ./checkpoints/voicerestore-1.1.pth --input test_input.wav --output test_output.wav --steps 32 --cfg_strength 0.5
This will process
test_input.wavand save the result astest_output.wav. -
Run a long form restoration, it uses window chunking:
python inference_long.py --checkpoint ./checkpoints/voicerestore-1.1.pth --input long_audio_file.mp3 --output test_output_long.wav --steps 8 --cfg_strength 0.5 --window_size_sec 10.0 --overlap 0.3
This will save the result as
test_output_long.wav.
Usage
To restore your own audio files:
from model import OptimizedAudioRestorationModel
model = OptimizedAudioRestorationModel()
restored_audio = model.forward(input_audio, steps=32, cfg_strength=0.5)
Alternative Usage - using Transformers 🤗
!git lfs install
!git clone https://huggingface.co/jadechoghari/VoiceRestore
%cd VoiceRestore
!pip install -r requirements.txt
from transformers import AutoModel
# path to the model folder (on colab it's as follows)
checkpoint_path = "/content/VoiceRestore"
model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True)
model("test_input.wav", "test_output.wav")
Model Details
- Architecture: Flow-matching transformer
- Parameters: 300M+ parameters
- Input: Degraded speech audio (various formats supported)
- Output: Restored speech
Limitations and Future Work
- Current model is optimized for speech; may not perform optimally on music or other audio types.
- Ongoing research to improve performance on extreme degradations.
- Future updates may include real-time processing capabilities.
Citation
If you use VoiceRestore in your research, please cite our paper:
@misc{kirdey2025voicerestoreflowmatchingtransformersspeech,
title={VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration},
author={Stanislav Kirdey},
year={2025},
eprint={2501.00794},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2501.00794},
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Based on the E2-TTS implementation by Lucidrains
- Special thanks to the open-source community for their invaluable contributions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voicerestore_fork-0.1.0.tar.gz.
File metadata
- Download URL: voicerestore_fork-0.1.0.tar.gz
- Upload date:
- Size: 28.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a18d22d8ee124445beca0ef582f8c7ccc7e8a5b91551fdf1f9d7b82bf97a0916
|
|
| MD5 |
26fe5163056f618693ffbcba8b57aeb2
|
|
| BLAKE2b-256 |
b02fabb41f798e8726ed540eee4a9e058ba5c7db3c003cea603d84d9833d536b
|
File details
Details for the file voicerestore_fork-0.1.0-py3-none-any.whl.
File metadata
- Download URL: voicerestore_fork-0.1.0-py3-none-any.whl
- Upload date:
- Size: 36.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87fc131e558f7feb23699a3fed204f755d466c3228d26b6a6fbc4e1d8e109922
|
|
| MD5 |
fed638b0169b6c3fe15b7e3658e509ef
|
|
| BLAKE2b-256 |
75b6201ebc4aebeb9561c395d8fb40c8878fd43ff43ac68a8a077c0d0ed17d4a
|