Skip to main content

A toolbox for audio dataset processing and augmentation.

Project description

Datasets Toolbox

A toolbox for creating, processing and inspecting audio/image datasets through a simple CLI interface.

Installation

pip install datasets-toolbox

Usage

The goal of datasets-toolbox is to build audio/image datasets with CLI.

All the commands support --config [config-name] and --split [split-name] options to specified the target. Where config-name is the configuration name (e.g. language) and split-name is something like train, validation, test.

Add More Data

datasets import --config [data] --split [train] <sources>

Import data into datasets structure.

If the configuration/split is not configured, will defaults to default configuration and train split.

Modify Dataset

datasets modify <action> --config [data] --split [train] --other-params

If the configuration/split is not configured, will defaults to recursively run on all configurations and all splits.

Audio Slicer

datasets modify slice --config [data] --split [train] --min-length [ms] --hop-size [n]

Audio Resample

datasets modify resample --config [data] --split [train] --sr [16000] --mono

Audio Transcription

datasets modify transcribe --model [openai/whisper-large-v3-turbo]'

Inspect Dataset

datasets inspect --config [data] --split [train] --other-params

If the configuration/split is not configured, will defaults to recursively run on all configurations and all splits.

Audio Hours

datasets inspect hours --config [data] --split [train]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasets_toolbox-0.1.0.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

datasets_toolbox-0.1.0-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file datasets_toolbox-0.1.0.tar.gz.

File metadata

  • Download URL: datasets_toolbox-0.1.0.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for datasets_toolbox-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c059d470f2472631d329658b50d01012290e5ba9f98e6af0100f38c9b0593ab2
MD5 12fc22f999f334cabfe7d4096357e912
BLAKE2b-256 9705a302628710e8e302f89230aec50975ec6276facee1ba205af2b3e8c5f833

See more details on using hashes here.

File details

Details for the file datasets_toolbox-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for datasets_toolbox-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f29df8f68962204096038a0183d3bb1005869ca0db0d5ecc7091be445531bff
MD5 2810ddfca41ab63aadb45e0e4913c617
BLAKE2b-256 779561288c2bda302d603260e410704042518437acf0a2dc5f0206b4ae0cace3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page