A neural audio codec for expressive speech
Project description
spine
A neural audio codec for expressive speech.
Spine encodes 24 kHz mono audio into multi-scale FSQ tokens at 1.57 kbps (~88 tokens/s) across four temporal scales (~6 / 12 / 23 / 47 Hz), keeping sequences short for downstream language models. The convolutional decoder is hard-bandlimited at 6 kHz by a fixed crossover; a filtered-noise branch and a complex-STFT head synthesize the high band under purely adversarial supervision, eliminating the high-frequency static typical of GAN codecs.
- 115M-parameter generator: conv encoder/decoder with a 512-d transformer bottleneck (8 + 12 layers)
- Multi-scale FSQ (
pool → quantize → repeat) on a shared latent, with no codebook collapse - Reconstruction losses bandlimited below the crossover; the high band is owned by the DDSP split
Installation
pip install spine-codec
Training pulls in extra dependencies (wandb):
pip install "spine-codec[train]"
For development, clone this repo and run uv sync.
Usage
The pretrained model is downloaded from twangodev/spine-codec on first use; pass --checkpoint to use a local training checkpoint instead.
spine encode --input speech.wav --output codes.pt
spine decode --input codes.pt --output speech.wav
spine recon --input speech.wav --output roundtrip.wav
import torchaudio
from spine import Spine
model = Spine.from_pretrained("twangodev/spine-codec")
audio, sr = torchaudio.load("speech.wav") # 24 kHz mono
codes = model.encode(audio.unsqueeze(0))
reconstruction = model.decode(codes)
Training
spine train --config configs/train.yaml
Training configs live in the repo (not the wheel), so train from a git checkout
with the train extra installed.
YAML configs are sparse overrides on top of the defaults in spine/config.py.
Acknowledgements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spine_codec-0.1.0.tar.gz.
File metadata
- Download URL: spine_codec-0.1.0.tar.gz
- Upload date:
- Size: 221.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3894c7c59f27892be63538d6c8814c857ab47241a1816d8019624c8dc0d208f9
|
|
| MD5 |
085f19a2608a1216bf91f1f7b3b2197c
|
|
| BLAKE2b-256 |
4f6afaf7784044d7487b30b83c7671ce6fa99c6ce9ab7e060a4a3696371bdceb
|
Provenance
The following attestation bundles were made for spine_codec-0.1.0.tar.gz:
Publisher:
python.yml on twangodev/spine-codec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
spine_codec-0.1.0.tar.gz -
Subject digest:
3894c7c59f27892be63538d6c8814c857ab47241a1816d8019624c8dc0d208f9 - Sigstore transparency entry: 2048479616
- Sigstore integration time:
-
Permalink:
twangodev/spine-codec@0b166eb7ba38fa9c6f58e9b698eb95b90ed080cc -
Branch / Tag:
refs/heads/main - Owner: https://github.com/twangodev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python.yml@0b166eb7ba38fa9c6f58e9b698eb95b90ed080cc -
Trigger Event:
push
-
Statement type:
File details
Details for the file spine_codec-0.1.0-py3-none-any.whl.
File metadata
- Download URL: spine_codec-0.1.0-py3-none-any.whl
- Upload date:
- Size: 32.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bde9afc87aac66a6f0ffb974c74669342f7426107917fb4ad30275885e052d8b
|
|
| MD5 |
c57d420029737dfbf02ef5a53f5258e8
|
|
| BLAKE2b-256 |
c3def9b1d0a8fb0013f72d97c58cbe0d620945467bc95dbf0ff93b8701b361ef
|
Provenance
The following attestation bundles were made for spine_codec-0.1.0-py3-none-any.whl:
Publisher:
python.yml on twangodev/spine-codec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
spine_codec-0.1.0-py3-none-any.whl -
Subject digest:
bde9afc87aac66a6f0ffb974c74669342f7426107917fb4ad30275885e052d8b - Sigstore transparency entry: 2048479621
- Sigstore integration time:
-
Permalink:
twangodev/spine-codec@0b166eb7ba38fa9c6f58e9b698eb95b90ed080cc -
Branch / Tag:
refs/heads/main - Owner: https://github.com/twangodev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python.yml@0b166eb7ba38fa9c6f58e9b698eb95b90ed080cc -
Trigger Event:
push
-
Statement type: