Unsupervised syllable segmentation, evaluation, and embedding extraction toolkit for speech audio
Project description
findsylls
Language-agnostic toolkit for syllable-level speech tokenization and embedding extraction.
findsylls provides:
- Envelope computation from waveform (RMS, Hilbert, low-pass, SBS, gammatone, theta)
- Syllable segmentation (peak/valley and neural options)
- Evaluation against TextGrid annotations (nuclei, boundaries, spans)
- Per-syllable embedding extraction for downstream tasks
Install
# Core package
pip install findsylls
# Optional extras
pip install 'findsylls[viz]' # plotting helpers
pip install 'findsylls[embedding]' # neural feature extraction
pip install 'findsylls[end2end]' # neural segmentation methods
pip install 'findsylls[storage]' # HDF5 storage support
pip install 'findsylls[all]' # all extras
Quick Start
1) Segment a file into syllables
from findsylls import segment_audio
sylls, envelope, times = segment_audio(
"example.wav",
envelope_fn="sbs",
segment_fn="peakdetect",
)
print(f"Found {len(sylls)} syllables")
# sylls: [(start, peak, end), ...]
2) Evaluate against TextGrid annotations
from findsylls import run_evaluation, aggregate_results
results = run_evaluation(
textgrid_paths="data/**/*.TextGrid",
wav_paths="data/**/*.wav",
phone_tier=1,
syllable_tier=2,
word_tier=3,
envelope_fn="hilbert",
)
summary = aggregate_results(results, dataset_name="MyCorpus")
print(summary)
3) Extract syllable embeddings
from findsylls import embed_audio
embeddings, metadata = embed_audio(
"example.wav",
segmentation="peakdetect",
features="mfcc", # mfcc | melspec | sylber | vg_hubert
pooling="mean", # mean | onc | max | median
)
print(embeddings.shape)
print(metadata["num_syllables"])
4) Batch embedding extraction
from findsylls import embed_corpus, save_embeddings
results = embed_corpus(
audio_paths=["a.wav", "b.wav", "c.wav"],
segmentation="peakdetect",
features="mfcc",
pooling="mean",
n_jobs=4,
)
save_embeddings(results, "embeddings.npz")
CLI
# Segment audio
findsylls segment input.wav --envelope sbs --method peakdetect --out sylls.json
# Extract embeddings
findsylls embed input.wav --features mfcc --pooling mean --out embeddings.npz
# Evaluate against TextGrid annotations
findsylls evaluate "data/**/*.wav" "data/**/*.TextGrid" \
--phone-tier 1 --syllable-tier 2 --word-tier 3 \
--envelope hilbert --out results.csv
Methods Overview
Envelope Methods
rmshilbertlowpasssbsgammatonetheta- Feature-based envelopes (e.g., SSM / GreedyCosine / CLS-attention where available)
Segmentation Methods
peakdetect- Neural/custom segmenters exposed through the segmentation module
Embedding Features
mfcc(13/26/39 dims with deltas)melspec(mel-filterbank)sylbervg_hubert
Examples and Notebook
- Interactive demo notebook: findsylls_demo.ipynb
- Example scripts: examples/
Citation
If you use findsylls in academic work, please cite:
Plain text:
Vázquez Martínez, Héctor Javier. (2026). findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding. arXiv:2603.26292. https://arxiv.org/abs/2603.26292
BibTeX:
@misc{martinez2026findsyllslanguageagnostictoolkitsyllablelevel,
title={findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding},
author={Héctor Javier Vázquez Martínez},
year={2026},
eprint={2603.26292},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.26292},
}
License
MIT. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file findsylls-2.0.0.tar.gz.
File metadata
- Download URL: findsylls-2.0.0.tar.gz
- Upload date:
- Size: 455.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44efb17fe74a824312d24c000ae2837f5b62d4d05eaa70d62b34ed7b4633ba50
|
|
| MD5 |
0a691fcb4bf7aa6d55ae19319036aa30
|
|
| BLAKE2b-256 |
20ec32b671eaa428b941a32e21c5fb46f9207e0286db28678162513908fb6d3a
|
File details
Details for the file findsylls-2.0.0-py3-none-any.whl.
File metadata
- Download URL: findsylls-2.0.0-py3-none-any.whl
- Upload date:
- Size: 92.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32d717928d844df4bee4c5d40ca44e4910817469b4783a7b143e62d4da0361a2
|
|
| MD5 |
c214c91f76eef4fe7eab39059d591c65
|
|
| BLAKE2b-256 |
fee65010f870fa822be2d90b78a86698d59e34c579b69b698928cf0d14d15201
|