Skip to main content

Unsupervised syllable segmentation and evaluation toolkit for speech audio

Project description

findsylls

PyPI version Python versions License: MIT Status

Unsupervised syllable(-like) segmentation & evaluation toolkit for speech audio. Extract amplitude / modulation envelopes, segment into candidate syllables, and (optionally) evaluate versus Praat TextGrid annotations at nuclei, syllable boundary/span, and word boundary/span levels.

Features

  • Pluggable amplitude envelope front-ends: RMS, Hilbert, low-pass, spectral band subtraction (SBS), gammatone, theta oscillator.
  • Peak & valley segmentation (extensible: hooks for future algorithms like Mermelstein, oscillator-based).
  • Robust TextGrid parsing for phones, syllables, words with vowel / syllabic consonant filtering.
  • Multi-level evaluation metrics (TP / Ins / Del / Sub; precision/recall/F1/TER aggregation helpers).
  • Batch pipeline utilities + fuzzy filename matching (.wav.TextGrid).
  • Optional plotting layer for qualitative inspection.

Install

# Core (from PyPI)
pip install findsylls

# Local clone (editable development)
pip install -e .[dev]

# With plotting / visualization extras
pip install 'findsylls[viz]'

# Install from Git (exact tag)
pip install 'git+https://github.com/hjvm/findsylls.git@v0.1.1'

Quick Start

from findsylls import segment_audio
sylls, env, t = segment_audio("example.wav", envelope_fn="sbs", segment_fn="peaks_and_valleys")
print(sylls[:5])

Batch evaluation:

from findsylls import run_evaluation
results = run_evaluation(
    textgrid_paths="data/**/*.TextGrid",
    wav_paths="data/**/*.wav",
    phone_tier=1,
    syllable_tier=2,
    word_tier=3,
    envelope_fn="hilbert",
)
print(results.head())

Aggregate:

from findsylls import aggregate_results
summary = aggregate_results(results, dataset_name="MyCorpus")
print(summary)

CLI

After install:

findsylls segment input.wav --envelope sbs --method peaks_and_valleys --out sylls.json
findsylls evaluate "data/**/*.wav" "data/**/*.TextGrid" --phone-tier 1 --syllable-tier 2 --word-tier 3 --envelope hilbert --out results.csv

Show help:

findsylls --help
findsylls segment --help
findsylls evaluate --help

API Surface

Function Purpose
segment_audio One-file end‑to‑end (load → envelope → segment).
run_evaluation Batch match WAV/TextGrid and compute metrics.
get_amplitude_envelope Compute envelope via a registered method.
segment_envelope Dispatch segmentation algorithm.
flatten_results / aggregate_results Reshape & aggregate evaluation outputs.
plot_segmentation_result Multi-panel qualitative plot (optional).

Adding Methods

  1. Envelope: implement compute_* returning (env, times) in envelope/ and register in envelope/dispatch.py.
  2. Segmentation: implement segment_<name>(envelope, times, **kwargs) in segmentation/ and add branch in segmentation/dispatch.py.

TextGrid Tier Indexing

Indices are 0-based (as provided by the textgrid library). Pass None to skip a tier or -1 for placeholder syllable generation (currently returns empty list).

Evaluation Conventions

  • Default tolerance = 0.05s.
  • EVAL_METHODS ordering drives flatten/aggregate loops; include new metric keys there if extending.
  • Substitutions matter for span metrics; remain zero for nuclei/boundary F1 semantics.

Performance Notes

  • Audio loading prefers torchaudio when present (install separately) else falls back to soundfile / librosa.
  • Envelope computation is vectorized (NumPy); theta method includes a gammatone filterbank which can be slower—use SBS or Hilbert for faster prototyping.
  • For large corpora, pre‑sample or cap duration (see example notebook) to iterate quickly.
  • Parallelization: current API processes files sequentially; external parallel mapping (e.g., joblib or multiprocessing) around segment_audio is safe if you don’t mutate globals.

FAQ

Why are boundary/spans columns missing? If a tier index is None or produces no intervals, those metrics are skipped intentionally.

How do I add my own envelope? Implement a function returning (envelope, times) and register it in envelope/dispatch.py.

Can I stream long recordings? Not yet; current design assumes full in‑memory arrays. A streaming envelope interface is on the roadmap.

Why do I get 0 TP for nuclei? Likely vowel set mismatch; confirm phone tier labels are standard (ARPABET or simple vowels) and consider adjusting SYLLABIC.

Roadmap / TODO

  • Implement generate_syllable_intervals (placeholder now).
  • Additional segmentation algorithms (Mermelstein, oscillator-based).
  • More robust CLI progress + JSON schema for outputs.
  • Optional streaming / large-file handling.

Legacy Code

The previous exploratory/monolithic implementations are retained under a legacy/ folder (formerly old/ and findsylls_old/) for reference only. They are excluded from distribution and not supported; prefer the public API described above.

License

MIT. See LICENSE.

Citation

If you use this software in academic work, please cite the repository until a formal paper/preprint is available. A CITATION.cff file will appear in a future release.

Plain text:

Vázquez Martínez, Héctor Javier. (2025). findsylls: Unsupervised syllable segmentation & evaluation toolkit (Version 0.1.1) [Computer software]. https://github.com/hjvm/findsylls

BibTeX:

@software{findsylls,
    author = {Vázquez Martínez, Héctor Javier},
    title = {findsylls: Unsupervised syllable segmentation & evaluation toolkit},
    year = {2025},
    version = {0.1.1},
    url = {https://github.com/hjvm/findsylls},
    license = {MIT}
}

For development guidelines see .github/copilot-instructions.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

findsylls-0.2.0.tar.gz (305.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

findsylls-0.2.0-py3-none-any.whl (29.3 kB view details)

Uploaded Python 3

File details

Details for the file findsylls-0.2.0.tar.gz.

File metadata

  • Download URL: findsylls-0.2.0.tar.gz
  • Upload date:
  • Size: 305.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for findsylls-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b17769afa4eebae092bb4b8b6d5a632ccec6c48cba5c90aef9320b059fea7632
MD5 2051cd73a2f467809f6996f1209096d9
BLAKE2b-256 1c328481532bc60ce1f9c96c45133fcd76661b9aa0998f73471b71773e09abad

See more details on using hashes here.

File details

Details for the file findsylls-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: findsylls-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 29.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for findsylls-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b4038537b7ec7ccb4cdc412c1a2e4f9143885ad9910bd815bfdafc8041df2830
MD5 a49a1fadf0cd50cb048db01215dc9e24
BLAKE2b-256 6ce5cf229720f3c573660547986f68ec59b53ef578b986eaeccdb72bd9dfdd3a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page