Skip to main content

Python3 library for converting between various audio dataset formats.

Project description

The audio-dataset-converter library allows the conversion between various dataset formats of audio datasets. Filters can be supplied as well, e.g., for cleaning up the data.

Dataset formats:

  • classification: ADAMS (r/w), sub-dir (r/w), TXT (r/w)

  • speech: ADAMS (r/w), CommonVoice (r/w), Festvox (r/w), Huggingface Audiofolder (r/w), TXT (r/w)

Changelog

0.0.3 (2025-07-10)

  • added set-placeholder filter for dynamically setting (temporary) placeholders at runtime

  • added –resume_from option to relevant readers that allows resuming the data processing from the first file that matches this glob expression (e.g., */012345.wav)

  • requiring seppl>=0.2.17 now for resume, split group, skippable plugin support and avoiding deprecated use of pkg_resources

  • to-adams-sp writer now uses -t short flag for the transcript like the from-adams-sp reader

  • added the from-multi meta-reader that combines multiple base readers and returns their output

  • added the to-multi meta-writer that forwards the data to multiple base writers

  • using wai_common instead of wai.common now

  • added split_group parameter to splittable writers (stream/batch)

  • fixed the construction of the error messages in the pyfunc reader/filter/writer classes

  • added metadata-to-placeholder filter to transfer meta-data files into placeholders

0.0.2 (2025-03-14)

  • added setuptools as dependency

  • switched to underscores in project name

  • added discard-by-name filter

  • requiring seppl>=0.2.13 now

  • added support for aliases

  • added placeholder support to tools: adc-convert, adc-exec

  • added placeholder support to readers: from-adams-ac, from-subdir-ac, from-txt-ac, from-adams-sp, from-commonvoice-sp, from-festvox-sp, from-hf-audiofolder-sp, from-txt-sp, from-data, poll-dir, from-pyfunc

  • added placeholder support to writers: to-adams-ac, to-subdir-ac, to-txt-ac, to-adams-sp, to-commonvoice-sp, to-festvox-sp, to-hf-audiofolder-sp, to-txt-sp, to-audioinfo, to-data

0.0.1 (2024-07-05)

  • initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_dataset_converter-0.0.3.tar.gz (58.1 kB view details)

Uploaded Source

File details

Details for the file audio_dataset_converter-0.0.3.tar.gz.

File metadata

  • Download URL: audio_dataset_converter-0.0.3.tar.gz
  • Upload date:
  • Size: 58.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for audio_dataset_converter-0.0.3.tar.gz
Algorithm Hash digest
SHA256 16d42bee8ba5004d3122f0f2247bee053bbcc08a64be9909108301a2478d6de5
MD5 acb8e8c6d336c1b63b472cad4823ecf2
BLAKE2b-256 5f41ac2ccb054fb13d63b1f3d55e32ecdafb8afe9d9f4ea772bdd827edf5d296

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page