Skip to main content

This package is written for the restoration of degraded speech

Project description

arXiv Open In Colab PyPI version githubioHuggingFace

:speaking_head: :wrench: VoiceFixer

Voicefixer aims to restore human speech regardless how serious its degraded. It can handle noise, reveberation, low resolution (2kHz~44.1kHz) and clipping (0.1-1.0 threshold) effect within one model.

This package provides:

  • A pretrained Voicefixer, which is build based on neural vocoder.
  • A pretrained 44.1k universal speaker-independent neural vocoder.

main

  • If you found this repo helpful, please consider citing or "Buy Me A Coffee"
 @misc{liu2021voicefixer,   
     title={VoiceFixer: Toward General Speech Restoration With Neural Vocoder},   
     author={Haohe Liu and Qiuqiang Kong and Qiao Tian and Yan Zhao and DeLiang Wang and Chuanzeng Huang and Yuxuan Wang},  
     year={2021},  
     eprint={2109.13731},  
     archivePrefix={arXiv},  
     primaryClass={cs.SD}  
 }

Demo

Please visit demo page to view what voicefixer can do.

Usage

Run Modes

Mode Description
0 Original Model (suggested by default)
1 Add preprocessing module (remove higher frequency)
2 Train mode (might work sometimes on seriously degraded real speech)
all Run all modes - will output 1 wav file for each supported mode.

Command line

First, install voicefixer via pip:

pip install voicefixer

Process a file:

# Specify the input .wav file. Output file is outfile.wav.
voicefixer --infile test/utterance/original/original.wav
# Or specify a output path
voicefixer --infile test/utterance/original/original.wav --outfile test/utterance/original/original_processed.wav

Process files in a folder:

voicefixer --infolder /path/to/input --outfolder /path/to/output

Change mode (The default mode is 0):

voicefixer --infile /path/to/input.wav --outfile /path/to/output.wav --mode 1

Run all modes:

# output file saved to `/path/to/output-modeX.wav`.
voicefixer --infile /path/to/input.wav --outfile /path/to/output.wav --mode all

Pre-load the weights only without any actual processing:

voicefixer --weight_prepare

For more helper information please run:

voicefixer -h

Desktop App

Demo on Youtube (Thanks @Justin John)

Install voicefixer via pip:

pip install voicefixer

You can test audio samples on your desktop by running website (powered by streamlit)

  1. Clone the repo first.
git clone https://github.com/haoheliu/voicefixer.git
cd voicefixer

:warning: For windows users, please make sure you have installed WGET and added the wget command to the system path (thanks @justinjohn0306).

  1. Initialize and start web page.
# Run streamlit 
streamlit run test/streamlit.py
  • If you run for the first time: the web page may leave blank for several minutes for downloading models. You can checkout the terminal for downloading progresses.

  • You can use this low quality speech file we provided for a test run. The page after processing will look like the following.

figure

  • For users from main land China, if you experience difficulty on downloading checkpoint. You can access them alternatively on 百度网盘 (提取密码: qis6). Please download the two checkpoints inside and place them in the following folder.
    • Place vf.ckpt inside ~/.cache/voicefixer/analysis_module/checkpoints. (The "~" represents your home directory)
    • Place model.ckpt-1490000_trimed.pt inside ~/.cache/voicefixer/synthesis_module/44100. (The "~" represents your home directory)

Python Examples

First, install voicefixer via pip:

pip install voicefixer

Then run the following scripts for a test run:

git clone https://github.com/haoheliu/voicefixer.git; cd voicefixer
python3 test/test.py # test script

We expect it will give you the following output:

Initializing VoiceFixer...
Test voicefixer mode 0, Pass
Test voicefixer mode 1, Pass
Test voicefixer mode 2, Pass
Initializing 44.1kHz speech vocoder...
Test vocoder using groundtruth mel spectrogram...
Pass

test/test.py mainly contains the test of the following two APIs:

  • voicefixer.restore
  • vocoder.oracle
...

# TEST VOICEFIXER
## Initialize a voicefixer
print("Initializing VoiceFixer...")
voicefixer = VoiceFixer()
# Mode 0: Original Model (suggested by default)
# Mode 1: Add preprocessing module (remove higher frequency)
# Mode 2: Train mode (might work sometimes on seriously degraded real speech)
for mode in [0,1,2]:
    print("Testing mode",mode)
    voicefixer.restore(input=os.path.join(git_root,"test/utterance/original/original.flac"), # low quality .wav/.flac file
                       output=os.path.join(git_root,"test/utterance/output/output_mode_"+str(mode)+".flac"), # save file path
                       cuda=False, # GPU acceleration
                       mode=mode)
    if(mode != 2):
        check("output_mode_"+str(mode)+".flac")
    print("Pass")

# TEST VOCODER
## Initialize a vocoder
print("Initializing 44.1kHz speech vocoder...")
vocoder = Vocoder(sample_rate=44100)

### read wave (fpath) -> mel spectrogram -> vocoder -> wave -> save wave (out_path)
print("Test vocoder using groundtruth mel spectrogram...")
vocoder.oracle(fpath=os.path.join(git_root,"test/utterance/original/p360_001_mic1.flac"),
               out_path=os.path.join(git_root,"test/utterance/output/oracle.flac"),
               cuda=False) # GPU acceleration

...

You can clone this repo and try to run test.py inside the test folder.

Docker

Currently the the Docker image is not published and needs to be built locally, but this way you make sure you're running it with all the expected configuration. The generated image size is about 10GB and that is mainly due to the dependencies that consume around 9.8GB on their own.

However, the layer containing voicefixer is the last added layer, making any rebuild if you change sources relatively small (~200MB at a time as the weights get refreshed on image build).

The Dockerfile can be viewed here.

After cloning the repo:

OS Agnostic

# To build the image
cd voicefixer
docker build -t voicefixer:cpu .

# To run the image
docker run --rm -v "$(pwd)/data:/opt/voicefixer/data" voicefixer:cpu <all_other_cli_args_here>

## Example: docker run --rm -v "$(pwd)/data:/opt/voicefixer/data" voicefixer:cpu --infile data/my-input.wav --outfile data/my-output.mode-all.wav --mode all

Wrapper script: Linux and MacOS

# To build the image
cd voicefixer
./docker-build-local.sh

# To run the image
./run.sh <all_other_cli_args_here>

## Example: ./run.sh --infile data/my-input.wav --outfile data/my-output.mode-all.wav --mode all

Others Features

  • How to use your own vocoder, like pre-trained HiFi-Gan?

First you need to write a following helper function with your model. Similar to the helper function in this repo: https://github.com/haoheliu/voicefixer/blob/main/voicefixer/vocoder/base.py#L35

    def convert_mel_to_wav(mel):
        """
        :param non normalized mel spectrogram: [batchsize, 1, t-steps, n_mel]
        :return: [batchsize, 1, samples]
        """
        return wav

Then pass this function to voicefixer.restore, for example:

voicefixer.restore(input="", # input wav file path
                   output="", # output wav file path
                   cuda=False, # whether to use gpu acceleration
                   mode = 0,
                   your_vocoder_func = convert_mel_to_wav)

Note:

  • For compatibility, your vocoder should working on 44.1kHz wave with mel frequency bins 128.
  • The input mel spectrogram to the helper function should not be normalized by the width of each mel filter.

Materials

46dnPO.png 46dMxH.png

Change log

See CHANGELOG.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicefixer-0.1.3.tar.gz (48.8 kB view details)

Uploaded Source

Built Distribution

voicefixer-0.1.3-py3-none-any.whl (53.2 kB view details)

Uploaded Python 3

File details

Details for the file voicefixer-0.1.3.tar.gz.

File metadata

  • Download URL: voicefixer-0.1.3.tar.gz
  • Upload date:
  • Size: 48.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for voicefixer-0.1.3.tar.gz
Algorithm Hash digest
SHA256 40902d1d8fce2a6f8d23ddee4e86038450e8f8fd41c7e71deeb6da2cd0dae218
MD5 ee889c36252cc8dcbf5cd59c80d47c78
BLAKE2b-256 7551f17056d957015813bd0d82c2d780a1cc0709d9cf26a5280a87fd35536415

See more details on using hashes here.

File details

Details for the file voicefixer-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: voicefixer-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 53.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for voicefixer-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7ee61b71c25fbdf1ef4e5407fe870ec0b351f8a82765806a8b09c37eb3a1162d
MD5 d27c49977b6597d35f46c107e2940836
BLAKE2b-256 eadc6a2a507531647e6fc05e10d8197b38733df112e4ae8a313c643a62889d22

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page