Skip to main content

A neural audio codec for expressive speech

Project description

spine

PyTorch PyPI Hugging Face License

A neural audio codec for expressive speech.

Architecture

Spine encodes 24 kHz mono audio into multi-scale FSQ tokens at 1.57 kbps (~88 tokens/s) across four temporal scales (~6 / 12 / 23 / 47 Hz), keeping sequences short for downstream language models. The convolutional decoder is hard-bandlimited at 6 kHz by a fixed crossover; a filtered-noise branch and a complex-STFT head synthesize the high band under purely adversarial supervision, eliminating the high-frequency static typical of GAN codecs.

  • 115M-parameter generator: conv encoder/decoder with a 512-d transformer bottleneck (8 + 12 layers)
  • Multi-scale FSQ (pool → quantize → repeat) on a shared latent, with no codebook collapse
  • Reconstruction losses bandlimited below the crossover; the high band is owned by the DDSP split

Installation

pip install spine-codec

Training pulls in extra dependencies (wandb):

pip install "spine-codec[train]"

For development, clone this repo and run uv sync.

Usage

The pretrained model is downloaded from twangodev/spine-codec on first use; pass --checkpoint to use a local training checkpoint instead.

spine encode --input speech.wav --output codes.pt
spine decode --input codes.pt --output speech.wav
spine recon  --input speech.wav --output roundtrip.wav
import torchaudio
from spine import Spine

model = Spine.from_pretrained("twangodev/spine-codec")
audio, sr = torchaudio.load("speech.wav")  # 24 kHz mono
codes = model.encode(audio.unsqueeze(0))
reconstruction = model.decode(codes)

Training

spine train --config configs/train.yaml

Training configs live in the repo (not the wheel), so train from a git checkout with the train extra installed.

YAML configs are sparse overrides on top of the defaults in spine/config.py.

Acknowledgements

The architecture builds on Mimi, SNAC, DAC, FSQ, and DDSP.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spine_codec-0.1.0.tar.gz (221.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spine_codec-0.1.0-py3-none-any.whl (32.5 kB view details)

Uploaded Python 3

File details

Details for the file spine_codec-0.1.0.tar.gz.

File metadata

  • Download URL: spine_codec-0.1.0.tar.gz
  • Upload date:
  • Size: 221.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for spine_codec-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3894c7c59f27892be63538d6c8814c857ab47241a1816d8019624c8dc0d208f9
MD5 085f19a2608a1216bf91f1f7b3b2197c
BLAKE2b-256 4f6afaf7784044d7487b30b83c7671ce6fa99c6ce9ab7e060a4a3696371bdceb

See more details on using hashes here.

Provenance

The following attestation bundles were made for spine_codec-0.1.0.tar.gz:

Publisher: python.yml on twangodev/spine-codec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spine_codec-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: spine_codec-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for spine_codec-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bde9afc87aac66a6f0ffb974c74669342f7426107917fb4ad30275885e052d8b
MD5 c57d420029737dfbf02ef5a53f5258e8
BLAKE2b-256 c3def9b1d0a8fb0013f72d97c58cbe0d620945467bc95dbf0ff93b8701b361ef

See more details on using hashes here.

Provenance

The following attestation bundles were made for spine_codec-0.1.0-py3-none-any.whl:

Publisher: python.yml on twangodev/spine-codec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page