Skip to main content

Implementation of SoundStream, an end-to-end neural audio codec

Project description

SoundStream - PyTorch

Implementation of SoundStream, an end-to-end neural audio codec

Figure 2 from the SoundStream paper

  • 🔊 Implements SoundStream model inference
  • 🎛️ Works with 27M parameter model pretrained on 10k hours of English speech (Multilingual LibriSpeech dataset)

Install

pip install soundstream

Usage

Note The pretrained model is configured as specified in NaturalSpeech 2, so it has different channels/strides than the original SoundStream.

import torchaudio

from soundstream import from_pretrained, load


waveform = load('in.wav')
audio_codec = from_pretrained()  # downloads model from Hugging Face

quantized = audio_codec(waveform, mode='encode')
recovered = audio_codec(quantized, mode='decode')

torchaudio.save('out.wav', recovered[0], 16000)

Citations

Code

Papers

@misc{zeghidour2021soundstream,
      title={SoundStream: An End-to-End Neural Audio Codec}, 
      author={Neil Zeghidour and Alejandro Luebs and Ahmed Omran and Jan Skoglund and Marco Tagliasacchi},
      year={2021},
      eprint={2107.03312},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}
@misc{kumar2019melgan,
      title={MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis}, 
      author={Kundan Kumar and Rithesh Kumar and Thibault de Boissiere and Lucas Gestin and Wei Zhen Teoh and Jose Sotelo and Alexandre de Brebisson and Yoshua Bengio and Aaron Courville},
      year={2019},
      eprint={1910.06711},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}
@misc{tagliasacchi2020seanet,
      title={SEANet: A Multi-modal Speech Enhancement Network}, 
      author={Marco Tagliasacchi and Yunpeng Li and Karolis Misiunas and Dominik Roblek},
      year={2020},
      eprint={2009.02095},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}
@misc{shen2023naturalspeech,
      title={NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers}, 
      author={Kai Shen and Zeqian Ju and Xu Tan and Yanqing Liu and Yichong Leng and Lei He and Tao Qin and Sheng Zhao and Jiang Bian},
      year={2023},
      eprint={2304.09116},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soundstream-0.0.1.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

soundstream-0.0.1-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file soundstream-0.0.1.tar.gz.

File metadata

  • Download URL: soundstream-0.0.1.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.1 CPython/3.11.2 Darwin/22.3.0

File hashes

Hashes for soundstream-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b7a9bae220030b68713c14ddeebe94f370ec5c4ae25211287991f0ed76fc76e7
MD5 3b17e6da25b61efe351ebe9fc0ce7373
BLAKE2b-256 78bbf69c4314da23673e1f1d6c6a8cbf7fba1b2a4fbb8a312b6fd7b05805bb2e

See more details on using hashes here.

File details

Details for the file soundstream-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: soundstream-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.1 CPython/3.11.2 Darwin/22.3.0

File hashes

Hashes for soundstream-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 795ee1127b5c8d1a179a27358ea144f5924a3e42819cae83362a2b67e557f174
MD5 faabf036a7ef8324562cfdfebb14ea7f
BLAKE2b-256 a10418c9024dd9ef3c9ed955bbb730bdc6cbf8348436cf2d70b8908623668dd0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page