Skip to main content

Unsupervised syllable segmentation, evaluation, and embedding extraction toolkit for speech audio

Project description

findsylls

PyPI version Python versions License: MIT

Language-agnostic toolkit for syllable-level speech tokenization and embedding extraction.

findsylls provides:

  • Envelope computation from waveform (RMS, Hilbert, low-pass, SBS, gammatone, theta)
  • Syllable segmentation (peak/valley and neural options)
  • Evaluation against TextGrid annotations (nuclei, boundaries, spans)
  • Per-syllable embedding extraction for downstream tasks

Install

# Core package
pip install findsylls

# Optional extras
pip install 'findsylls[viz]'       # plotting helpers
pip install 'findsylls[embedding]' # neural feature extraction
pip install 'findsylls[end2end]'   # neural segmentation methods
pip install 'findsylls[storage]'   # HDF5 storage support
pip install 'findsylls[all]'       # all extras

Quick Start

1) Segment a file into syllables

from findsylls import segment_audio

sylls, envelope, times = segment_audio(
    "example.wav",
    envelope_fn="sbs",
  segment_fn="peakdetect",
)

print(f"Found {len(sylls)} syllables")
# sylls: [(start, peak, end), ...]

2) Evaluate against TextGrid annotations

from findsylls import run_evaluation, aggregate_results

results = run_evaluation(
    textgrid_paths="data/**/*.TextGrid",
    wav_paths="data/**/*.wav",
    phone_tier=1,
    syllable_tier=2,
    word_tier=3,
    envelope_fn="hilbert",
)

summary = aggregate_results(results, dataset_name="MyCorpus")
print(summary)

3) Extract syllable embeddings

from findsylls import embed_audio

embeddings, metadata = embed_audio(
    "example.wav",
  segmentation="peakdetect",
    features="mfcc",      # mfcc | melspec | sylber | vg_hubert
    pooling="mean",       # mean | onc | max | median
)

print(embeddings.shape)
print(metadata["num_syllables"])

4) Batch embedding extraction

from findsylls import embed_corpus, save_embeddings

results = embed_corpus(
    audio_paths=["a.wav", "b.wav", "c.wav"],
  segmentation="peakdetect",
    features="mfcc",
    pooling="mean",
    n_jobs=4,
)

save_embeddings(results, "embeddings.npz")

CLI

# Segment audio
findsylls segment input.wav --envelope sbs --method peakdetect --out sylls.json

# Extract embeddings
findsylls embed input.wav --features mfcc --pooling mean --out embeddings.npz

# Evaluate against TextGrid annotations
findsylls evaluate "data/**/*.wav" "data/**/*.TextGrid" \
  --phone-tier 1 --syllable-tier 2 --word-tier 3 \
  --envelope hilbert --out results.csv

Methods Overview

Envelope Methods

  • rms
  • hilbert
  • lowpass
  • sbs
  • gammatone
  • theta
  • Feature-based envelopes (e.g., SSM / GreedyCosine / CLS-attention where available)

Segmentation Methods

  • peakdetect
  • Neural/custom segmenters exposed through the segmentation module

Embedding Features

  • mfcc (13/26/39 dims with deltas)
  • melspec (mel-filterbank)
  • sylber
  • vg_hubert

Examples and Notebook

Citation

If you use findsylls in academic work, please cite:

Plain text:

Vázquez Martínez, Héctor Javier. (2026). findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding. arXiv:2603.26292. https://arxiv.org/abs/2603.26292

BibTeX:

@misc{martinez2026findsyllslanguageagnostictoolkitsyllablelevel,
  title={findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding},
  author={Héctor Javier Vázquez Martínez},
  year={2026},
  eprint={2603.26292},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2603.26292},
}

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

findsylls-2.0.0.tar.gz (455.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

findsylls-2.0.0-py3-none-any.whl (92.4 kB view details)

Uploaded Python 3

File details

Details for the file findsylls-2.0.0.tar.gz.

File metadata

  • Download URL: findsylls-2.0.0.tar.gz
  • Upload date:
  • Size: 455.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for findsylls-2.0.0.tar.gz
Algorithm Hash digest
SHA256 44efb17fe74a824312d24c000ae2837f5b62d4d05eaa70d62b34ed7b4633ba50
MD5 0a691fcb4bf7aa6d55ae19319036aa30
BLAKE2b-256 20ec32b671eaa428b941a32e21c5fb46f9207e0286db28678162513908fb6d3a

See more details on using hashes here.

File details

Details for the file findsylls-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: findsylls-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 92.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for findsylls-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 32d717928d844df4bee4c5d40ca44e4910817469b4783a7b143e62d4da0361a2
MD5 c214c91f76eef4fe7eab39059d591c65
BLAKE2b-256 fee65010f870fa822be2d90b78a86698d59e34c579b69b698928cf0d14d15201

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page