Skip to main content

This package is written for the restoration of degraded speech

Project description

arXiv Open In Colab PyPI version githubio

VoiceFixer

This package provides:

  • A pretrained 44.1k universal speaker-independent neural vocoder.
  • A pretrained Voicefixer, which is build based on neural vocoder.

Voicefixer aims at the restoration of human speech regardless how serious its degraded. It can handle noise, reveberation, low resolution (2kHz~44.1kHz) and clipping (0.1-1.0 threshold) effect within one model.

46dAq1.png

Demo

Please visit demo page to view what voicefixer can do.

Usage

You need first install voicefixer via pip:

pip install voicefixer

Desktop App

You can test audio samples on your desktop by running website (powered by streamlit)

  1. Clone the repo first.
git clone https://github.com/haoheliu/voicefixer.git
cd voicefixer
  1. Initialize and start web page.
# Install additional web package
pip install streamlit
# Run streamlit 
streamlit run test/streamlit.py

Important: When you run the above command for the first time, the web page may leave blank for several minutes for downloading models. You can checkout the terminal for downloading progresses.

Python Examples

Run the following test script after cloning this repo.

pip install voicefixer
git clone https://github.com/haoheliu/voicefixer.git; cd voicefixer
python3 test/test.py # test script

We expect it will give you the following output:

Initializing VoiceFixer...
Test voicefixer mode 0, Pass
Test voicefixer mode 1, Pass
Test voicefixer mode 2, Pass
Initializing 44.1kHz speech vocoder...
Test vocoder using groundtruth mel spectrogram...
Pass

test/test.py mainly contains the test of the following two APIs:

  • voicefixer.restore
  • vocoder.oracle
...

# TEST VOICEFIXER
## Initialize a voicefixer
print("Initializing VoiceFixer...")
voicefixer = VoiceFixer()
# Mode 0: Original Model (suggested by default)
# Mode 1: Add preprocessing module (remove higher frequency)
# Mode 2: Train mode (might work sometimes on seriously degraded real speech)
for mode in [0,1,2]:
    print("Testing mode",mode)
    voicefixer.restore(input=os.path.join(git_root,"test/utterance/original/original.flac"), # low quality .wav/.flac file
                       output=os.path.join(git_root,"test/utterance/output/output_mode_"+str(mode)+".flac"), # save file path
                       cuda=False, # GPU acceleration
                       mode=mode)
    if(mode != 2):
        check("output_mode_"+str(mode)+".flac")
    print("Pass")

# TEST VOCODER
## Initialize a vocoder
print("Initializing 44.1kHz speech vocoder...")
vocoder = Vocoder(sample_rate=44100)

### read wave (fpath) -> mel spectrogram -> vocoder -> wave -> save wave (out_path)
print("Test vocoder using groundtruth mel spectrogram...")
vocoder.oracle(fpath=os.path.join(git_root,"test/utterance/original/p360_001_mic1.flac"),
               out_path=os.path.join(git_root,"test/utterance/output/oracle.flac"),
               cuda=False) # GPU acceleration

...

You can clone this repo and try to run test.py inside the test folder.

Others Features

  • How to use your own vocoder, like pre-trained HiFi-Gan?

First you need to write a following helper function with your model. Similar to the helper function in this repo: https://github.com/haoheliu/voicefixer/blob/main/voicefixer/vocoder/base.py#L35

    def convert_mel_to_wav(mel):
        """
        :param non normalized mel spectrogram: [batchsize, 1, t-steps, n_mel]
        :return: [batchsize, 1, samples]
        """
        return wav

Then pass this function to voicefixer.restore, for example:

voicefixer.restore(input="", # input wav file path
                   output="", # output wav file path
                   cuda=False, # whether to use gpu acceleration
                   mode = 0,
                   your_vocoder_func = convert_mel_to_wav)

Note:

  • For compatibility, your vocoder should working on 44.1kHz wave with mel frequency bins 128.
  • The input mel spectrogram to the helper function should not be normalized by the width of each mel filter.

Materials

 @misc{liu2021voicefixer,   
     title={VoiceFixer: Toward General Speech Restoration With Neural Vocoder},   
     author={Haohe Liu and Qiuqiang Kong and Qiao Tian and Yan Zhao and DeLiang Wang and Chuanzeng Huang and Yuxuan Wang},  
     year={2021},  
     eprint={2109.13731},  
     archivePrefix={arXiv},  
     primaryClass={cs.SD}  
 }

46dnPO.png 46dMxH.png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicefixer-0.0.10.tar.gz (38.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicefixer-0.0.10-py3-none-any.whl (43.7 kB view details)

Uploaded Python 3

File details

Details for the file voicefixer-0.0.10.tar.gz.

File metadata

  • Download URL: voicefixer-0.0.10.tar.gz
  • Upload date:
  • Size: 38.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.7.10

File hashes

Hashes for voicefixer-0.0.10.tar.gz
Algorithm Hash digest
SHA256 1c81adede1600224a710cc808c42546f3baba549c31d28dbc90560ead7c4f5a9
MD5 0da5472f4cfdce2a876e1e7fa6215a6d
BLAKE2b-256 07693759fcd96568f91e851228211d08bbeef06af063326bddf3511d7104948b

See more details on using hashes here.

File details

Details for the file voicefixer-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: voicefixer-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 43.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.7.10

File hashes

Hashes for voicefixer-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 b655ad8fc0d1aec522bd72a07fb00734778111a90c721454539bcc71507e072d
MD5 62fd2d4a29af9f29f7837b614e820625
BLAKE2b-256 ceac5c7790f4056939b221c3967304c177802b4a08b171bff9e97bdaafc3e5a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page