Python3 library for converting between various audio dataset formats.
Project description
The audio-dataset-converter library allows the conversion between various dataset formats of audio datasets. Filters can be supplied as well, e.g., for cleaning up the data.
Dataset formats:
classification: ADAMS (r/w), sub-dir (r/w), TXT (r/w)
speech: ADAMS (r/w), CommonVoice (r/w), Festvox (r/w), Huggingface Audiofolder (r/w), TXT (r/w)
Changelog
0.0.3 (2025-07-10)
added set-placeholder filter for dynamically setting (temporary) placeholders at runtime
added –resume_from option to relevant readers that allows resuming the data processing from the first file that matches this glob expression (e.g., */012345.wav)
requiring seppl>=0.2.17 now for resume, split group, skippable plugin support and avoiding deprecated use of pkg_resources
to-adams-sp writer now uses -t short flag for the transcript like the from-adams-sp reader
added the from-multi meta-reader that combines multiple base readers and returns their output
added the to-multi meta-writer that forwards the data to multiple base writers
using wai_common instead of wai.common now
added split_group parameter to splittable writers (stream/batch)
fixed the construction of the error messages in the pyfunc reader/filter/writer classes
added metadata-to-placeholder filter to transfer meta-data files into placeholders
0.0.2 (2025-03-14)
added setuptools as dependency
switched to underscores in project name
added discard-by-name filter
requiring seppl>=0.2.13 now
added support for aliases
added placeholder support to tools: adc-convert, adc-exec
added placeholder support to readers: from-adams-ac, from-subdir-ac, from-txt-ac, from-adams-sp, from-commonvoice-sp, from-festvox-sp, from-hf-audiofolder-sp, from-txt-sp, from-data, poll-dir, from-pyfunc
added placeholder support to writers: to-adams-ac, to-subdir-ac, to-txt-ac, to-adams-sp, to-commonvoice-sp, to-festvox-sp, to-hf-audiofolder-sp, to-txt-sp, to-audioinfo, to-data
0.0.1 (2024-07-05)
initial release
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file audio_dataset_converter-0.0.3.tar.gz.
File metadata
- Download URL: audio_dataset_converter-0.0.3.tar.gz
- Upload date:
- Size: 58.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16d42bee8ba5004d3122f0f2247bee053bbcc08a64be9909108301a2478d6de5
|
|
| MD5 |
acb8e8c6d336c1b63b472cad4823ecf2
|
|
| BLAKE2b-256 |
5f41ac2ccb054fb13d63b1f3d55e32ecdafb8afe9d9f4ea772bdd827edf5d296
|