Multi-Scale Neural Audio Codec
Project description
SNAC 🍿
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate.
🎸 Music samples | 🗣️ Speech samples |
---|---|
🎧 More audio samples available at https://hubertsiuzdak.github.io/snac/
Overview
SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC (see the image on the left). However, SNAC introduces a simple change where coarse tokens are sampled less frequently, covering a broader time span (see the image on the right).
This can not only save on bitrate, but more importantly this might be very useful for language modeling approaches to audio generation. E.g. with coarse tokens of ~10 Hz and a context window of 2048 you can effectively model a consistent structure of an audio track for ~3 minutes.
Pretrained models
Currently, all models support only single audio channel (mono).
Model | Bitrate | Sample Rate | Params | Recommended use case |
---|---|---|---|---|
hubertsiuzdak/snac_24khz | 0.98 kbps | 24 kHz | 19.8 M | 🗣️ Speech |
hubertsiuzdak/snac_32khz | 1.9 kbps | 32 kHz | 54.5 M | 🎸 Music / Sound Effects |
hubertsiuzdak/snac_44khz | 2.6 kbps | 44 kHz | 54.5 M | 🎸 Music / Sound Effects |
Usage
Install it using:
pip install snac
To encode (and decode) audio with SNAC in Python, use the following code:
import torch
from snac import SNAC
model = SNAC.from_pretrained("hubertsiuzdak/snac_32khz").eval().cuda()
audio = torch.randn(1, 1, 32000).cuda() # placeholder for actual audio with shape (B, 1, T)
with torch.inference_mode():
codes = model.encode(audio)
audio_hat = model.decode(codes)
You can also encode and reconstruct in a single call:
with torch.inference_mode():
audio_hat, codes = model(audio)
⚠️ Note that codes
is a list of token sequences of variable lengths, each corresponding to a different temporal
resolution.
>>> [code.shape[1] for code in codes]
[12, 24, 48, 96]
Acknowledgements
Module definitions are adapted from the Descript Audio Codec.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file snac-1.2.1.tar.gz
.
File metadata
- Download URL: snac-1.2.1.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 697f27fc5b98308eee8946739e5fd9c1b4ec629ef51b4f01c08dace1290685ee |
|
MD5 | fe9f116edbda97af3ae87aa93775a769 |
|
BLAKE2b-256 | 43385b64fb15c1cf02233252975c43c4b85ccacd9f77c55f7fee72b16b3bd2f6 |
File details
Details for the file snac-1.2.1-py3-none-any.whl
.
File metadata
- Download URL: snac-1.2.1-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96f90e221121ad03d6e3b060a787268b1efdbe424560a58f6f732df6d4914dc7 |
|
MD5 | 142366c2a36bac18efab8cb84bc81dbe |
|
BLAKE2b-256 | 794f6401dc74af3d9e9602209763eccbb7eac739c2501e499b51b560f71443c0 |