Python code for "Sylber: Syllabic Embedding Representation of Speech from Raw Audio"

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

SYLBER: Syllabic Embedding Representation of Speech from Raw Audio

Sylber is the first of its kind that yields extremely short tokens from raw audio (on average, 4.27 tokens/sec) through dynamic tokenization at the syllable granularity.

The model is developed and trained by Berkeley Speech Group.

Updates

03/02/2025

Distribute inference package

01/22/2025

Sylber is accepted at ICLR 2025!

12/25/2024

Initial code release with training and inference pipelines.
Checkpoint release

Installation

The model can be installed through pypi for inference.

pip install sylber

Please check demo notebook for the usage. For training, please follow the below instructions.

Usage

from sylber import Segmenter

# Loading Sylber
segmenter = Segmenter(model_ckpt="sylber")


# Run Sylber
wav_file = "samples/sample.wav"

outputs = segmenter(wav_file, in_second=True) # in_second can be False to output segments in frame numbers.

# outputs = {"segments": numpy array of [start, end] of segment,
#            "segment_features": numpy array of segment-averaged features,
#            "hidden_states": numpy array of raw features used for segmentation.

Environment

Install the dependencies from requirements.txt:

pip install -r requirements.txt

Training SYLBER

Datasets and Checkpoints

Noise Dataset for WavLM-based Augmentation: The noise dataset for the WavLM noise augmentation is sourced from DNS Challenge. You can use the following script to download the dataset:
```
bash download-dns-challenge-3.sh
```
and untar datasets_fullband/datasets_fullband.noise_fullband.tar.bz2
Generated Datasets: The other data used for training SYLBER are generated using the SDHuBERT repository. Please follow the instructions there for data preparation.
Checkpoints: Pretrained model checkpoints for sylber are available on Google Drive: link

Stage 1 Training

python train.py --config-name=sylber_base

Stage 2 Training

python train.py --config-name=sylber_base_stage2

The training is split into two stages. Make sure to review the configurations in the configs/ directory for detailed settings.

Inference

Segmentation and Visualization

For inference to obtain segmentations and visualize results, please refer to demo.ipynb.

SPARC (formerly known as Articulatory Encodec)

For using SPARC, refer to Speech-Articulatory-Coding for installation and usage instructions.

Acknowledgements

Website adapted from: https://github.com/BytedanceSpeech/bytedancespeech.github.io

Citation

If you use this work, please cite our paper:

@article{cho2024sylber,
  title={Sylber: Syllabic Embedding Representation of Speech from Raw Audio},
  author={Cho, Cheol Jun and Lee, Nicholas and Gupta, Akshat and Agarwal, Dhruv and Chen, Ethan and Black, Alan W and Anumanchipalli, Gopala K},
  journal={arXiv preprint arXiv:2410.07168},
  year={2024}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.4

Mar 11, 2025

This version

0.1.3

Mar 11, 2025

0.1.2

Mar 11, 2025

0.1.1

Mar 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sylber-0.1.3.tar.gz (25.5 kB view details)

Uploaded Mar 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sylber-0.1.3-py3-none-any.whl (27.7 kB view details)

Uploaded Mar 11, 2025 Python 3

File details

Details for the file sylber-0.1.3.tar.gz.

File metadata

Download URL: sylber-0.1.3.tar.gz
Upload date: Mar 11, 2025
Size: 25.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for sylber-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`fd459d861538fc7f9f0da6d02c6b76707ddafdd5a560a6b638e885299de6c403`
MD5	`68fa976a49262a3656a47f98d1505ecf`
BLAKE2b-256	`3711c4722a7f7fef99ef6ef9f02bdce41372872b0745e7ae1c9eba50191a464e`

See more details on using hashes here.

File details

Details for the file sylber-0.1.3-py3-none-any.whl.

File metadata

Download URL: sylber-0.1.3-py3-none-any.whl
Upload date: Mar 11, 2025
Size: 27.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for sylber-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`858e2f8460f69d906f1aeff87acde8f2176ea786d146e93a3e578460a2bfee85`
MD5	`8730dfc550c3a3f766ab6a752bce8530`
BLAKE2b-256	`6d1608ca393b29ea5f591e7715ef950a2d206d02e5d1a8fbce318d8a3de0ed53`

See more details on using hashes here.

sylber 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SYLBER: Syllabic Embedding Representation of Speech from Raw Audio

Updates

03/02/2025

01/22/2025

12/25/2024

Installation

Usage

Environment

Training SYLBER

Datasets and Checkpoints

Stage 1 Training

Stage 2 Training

Inference

Segmentation and Visualization

SPARC (formerly known as Articulatory Encodec)

Acknowledgements

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes