Skip to main content

Multi-Scale Neural Audio Codec

Project description

SNAC 🍿

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate.

🎸 Music samples 🗣️ Speech samples

🎧 More audio samples available at https://hubertsiuzdak.github.io/snac/

Overview

SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC (see the image on the left). However, SNAC introduces a simple change where coarse tokens are sampled less frequently, covering a broader time span (see the image on the right).

This can not only save on bitrate, but more importantly this might be very useful for language modeling approaches to audio generation. E.g. with coarse tokens of ~10 Hz and a context window of 2048 you can effectively model a consistent structure of an audio track for ~3 minutes.

snac.png

Pretrained models

Currently, all models support only single audio channel (mono).

Model Bitrate Sample Rate Params Recommended use case
hubertsiuzdak/snac_24khz 0.98 kbps 24 kHz 19.8 M 🗣️ Speech
hubertsiuzdak/snac_32khz 1.9 kbps 32 kHz 54.5 M 🎸 Music / Sound Effects
hubertsiuzdak/snac_44khz 2.6 kbps 44 kHz 54.5 M 🎸 Music / Sound Effects

Usage

Install it using:

pip install snac

To encode (and decode) audio with SNAC in Python, use the following code:

import torch
from snac import SNAC

model = SNAC.from_pretrained("hubertsiuzdak/snac_32khz").eval().cuda()
audio = torch.randn(1, 1, 32000).cuda()  # placeholder for actual audio with shape (B, 1, T)

with torch.inference_mode():
    codes = model.encode(audio)
    audio_hat = model.decode(codes)

You can also encode and reconstruct in a single call:

with torch.inference_mode():
    audio_hat, codes = model(audio)

⚠️ Note that codes is a list of token sequences of variable lengths, each corresponding to a different temporal resolution.

>>> [code.shape[1] for code in codes]
[12, 24, 48, 96]

Acknowledgements

Module definitions are adapted from the Descript Audio Codec.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snac-1.2.1.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

snac-1.2.1-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file snac-1.2.1.tar.gz.

File metadata

  • Download URL: snac-1.2.1.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for snac-1.2.1.tar.gz
Algorithm Hash digest
SHA256 697f27fc5b98308eee8946739e5fd9c1b4ec629ef51b4f01c08dace1290685ee
MD5 fe9f116edbda97af3ae87aa93775a769
BLAKE2b-256 43385b64fb15c1cf02233252975c43c4b85ccacd9f77c55f7fee72b16b3bd2f6

See more details on using hashes here.

File details

Details for the file snac-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: snac-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for snac-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 96f90e221121ad03d6e3b060a787268b1efdbe424560a58f6f732df6d4914dc7
MD5 142366c2a36bac18efab8cb84bc81dbe
BLAKE2b-256 794f6401dc74af3d9e9602209763eccbb7eac739c2501e499b51b560f71443c0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page