Implementation of SoundStream, an end-to-end neural audio codec
Project description
SoundStream - PyTorch
Implementation of SoundStream, an end-to-end neural audio codec
- 🔊 Implements SoundStream model inference
- 🎛️ Works with 27M parameter model pretrained on 10k hours of English speech (Multilingual LibriSpeech dataset)
Install
pip install soundstream
Usage
Note The pretrained model is configured as specified in NaturalSpeech 2, so it has different channels/strides than the original SoundStream.
import torchaudio
from soundstream import from_pretrained, load
waveform = load('in.wav')
audio_codec = from_pretrained() # downloads model from Hugging Face
quantized = audio_codec(waveform, mode='encode')
recovered = audio_codec(quantized, mode='decode')
torchaudio.save('out.wav', recovered[0], 16000)
Citations
Code
- https://github.com/descriptinc/melgan-neurips
- https://github.com/lucidrains/audiolm-pytorch
- https://github.com/wesbz/SoundStream
Papers
@misc{zeghidour2021soundstream,
title={SoundStream: An End-to-End Neural Audio Codec},
author={Neil Zeghidour and Alejandro Luebs and Ahmed Omran and Jan Skoglund and Marco Tagliasacchi},
year={2021},
eprint={2107.03312},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
@misc{kumar2019melgan,
title={MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis},
author={Kundan Kumar and Rithesh Kumar and Thibault de Boissiere and Lucas Gestin and Wei Zhen Teoh and Jose Sotelo and Alexandre de Brebisson and Yoshua Bengio and Aaron Courville},
year={2019},
eprint={1910.06711},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
@misc{tagliasacchi2020seanet,
title={SEANet: A Multi-modal Speech Enhancement Network},
author={Marco Tagliasacchi and Yunpeng Li and Karolis Misiunas and Dominik Roblek},
year={2020},
eprint={2009.02095},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
@misc{shen2023naturalspeech,
title={NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers},
author={Kai Shen and Zeqian Ju and Xu Tan and Yanqing Liu and Yichong Leng and Lei He and Tao Qin and Sheng Zhao and Jiang Bian},
year={2023},
eprint={2304.09116},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
soundstream-0.0.1.tar.gz
(4.8 kB
view details)
Built Distribution
File details
Details for the file soundstream-0.0.1.tar.gz
.
File metadata
- Download URL: soundstream-0.0.1.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.1 CPython/3.11.2 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7a9bae220030b68713c14ddeebe94f370ec5c4ae25211287991f0ed76fc76e7 |
|
MD5 | 3b17e6da25b61efe351ebe9fc0ce7373 |
|
BLAKE2b-256 | 78bbf69c4314da23673e1f1d6c6a8cbf7fba1b2a4fbb8a312b6fd7b05805bb2e |
File details
Details for the file soundstream-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: soundstream-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.1 CPython/3.11.2 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 795ee1127b5c8d1a179a27358ea144f5924a3e42819cae83362a2b67e557f174 |
|
MD5 | faabf036a7ef8324562cfdfebb14ea7f |
|
BLAKE2b-256 | a10418c9024dd9ef3c9ed955bbb730bdc6cbf8348436cf2d70b8908623668dd0 |