audio-dataset-converter

Python3 library for converting between various audio dataset formats.

These details have not been verified by PyPI

Project links

Homepage

Project description

The audio-dataset-converter library allows the conversion between various dataset formats of audio datasets. Filters can be supplied as well, e.g., for cleaning up the data.

Dataset formats:

classification: ADAMS (r/w), sub-dir (r/w), TXT (r/w)
speech: ADAMS (r/w), CommonVoice (r/w), Festvox (r/w), Huggingface Audiofolder (r/w), TXT (r/w)

Examples can be found here:

https://github.com/waikato-llm/audio-dataset-converter-examples

Changelog

0.1.0 (2025-10-31)

split-records filter now allows specifying the meta-data field in which to store the split name
the tee meta-filter can now forward or drop the incoming data based on a meta-data evaluation
the sub-process filter can be used for processing data with sub-flow of filters, can be conditional based on meta-data evaluation
the metadata-from-name filter can work on the path now as well (must be present)
switched to kasperl library for base API and generic pipeline plugins
added @abc.abstractmethod decorator where appropriate
the adc-exec tool now uses all remaining parameters as the pipeline components rather than having to specify them via the -p/–pipeline parameter, making it easy to simply prefix the adc-exec command to an existing adc-convert command-line
added the text-file and csv-file generators that work off files to populate the variable(s)
added support for class lister with ignored classes
adc-exec can load pipelines from file now as well, useful when dealing with large pipelines
added –load_pipeline option to adc-convert
added from-text-file reader and to-text-file writer
readers now locate files the first time the read() method gets called rather than in the initialized(), to allow more dynamic placeholders
added from-text-file reader and to-text-file writer
added block, stop filters for controlling the flow of data (via meta-data conditions)
added email support with get-email reader and send-email writer
added list-files reader for listing files in a directory
added list-to-sequence stream filter that forwards list items one by one
added console writer for outputting the data on stdout that is coming through
added watch-dir meta-reader that uses the watchdog library to react to file-system events rather than using fixed-interval polling like poll-dir
added delete-files writer
added copy-files filter
added support for caching plugins via ADC_CLASS_CACHE environment variable
added to-metadata writer that outputs the meta-data of an image
added attach-metadata filter that loads meta-data from a directory and attaches it to the data passing through
added annotation-to-storage and annotation-from-storage filters
annotation data is now being type-checked when setting it
requiring seppl>=0.3.0 now

0.0.4 (2025-07-15)

requiring seppl>=0.2.20 now for improved help requests in adc-convert tool

0.0.3 (2025-07-10)

added set-placeholder filter for dynamically setting (temporary) placeholders at runtime
added –resume_from option to relevant readers that allows resuming the data processing from the first file that matches this glob expression (e.g., */012345.wav)
requiring seppl>=0.2.17 now for resume, split group, skippable plugin support and avoiding deprecated use of pkg_resources
to-adams-sp writer now uses -t short flag for the transcript like the from-adams-sp reader
added the from-multi meta-reader that combines multiple base readers and returns their output
added the to-multi meta-writer that forwards the data to multiple base writers
using wai_common instead of wai.common now
added split_group parameter to splittable writers (stream/batch)
fixed the construction of the error messages in the pyfunc reader/filter/writer classes
added metadata-to-placeholder filter to transfer meta-data files into placeholders

0.0.2 (2025-03-14)

added setuptools as dependency
switched to underscores in project name
added discard-by-name filter
requiring seppl>=0.2.13 now
added support for aliases
added placeholder support to tools: adc-convert, adc-exec
added placeholder support to readers: from-adams-ac, from-subdir-ac, from-txt-ac, from-adams-sp, from-commonvoice-sp, from-festvox-sp, from-hf-audiofolder-sp, from-txt-sp, from-data, poll-dir, from-pyfunc
added placeholder support to writers: to-adams-ac, to-subdir-ac, to-txt-ac, to-adams-sp, to-commonvoice-sp, to-festvox-sp, to-hf-audiofolder-sp, to-txt-sp, to-audioinfo, to-data

0.0.1 (2024-07-05)

initial release

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.0

Oct 31, 2025

0.0.4

Jul 15, 2025

0.0.3

Jul 10, 2025

0.0.2

Mar 14, 2025

0.0.1

Jul 5, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_dataset_converter-0.1.0.tar.gz (41.6 kB view details)

Uploaded Oct 31, 2025 Source

File details

Details for the file audio_dataset_converter-0.1.0.tar.gz.

File metadata

Download URL: audio_dataset_converter-0.1.0.tar.gz
Upload date: Oct 31, 2025
Size: 41.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for audio_dataset_converter-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f12c569164427ec5caaa14e2e2982473895ef76e8fca448eb9f1c1edc3d0eee4`
MD5	`d8c942fa4106965743ca63ec955fedaf`
BLAKE2b-256	`1801dbf7ba6f965d2fa0e2b70b39bf1ae747a644ef8477936d4b273659712d57`

See more details on using hashes here.

audio-dataset-converter 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Changelog

0.1.0 (2025-10-31)

0.0.4 (2025-07-15)

0.0.3 (2025-07-10)

0.0.2 (2025-03-14)

0.0.1 (2024-07-05)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes