SpectrogramDataset
Project description
SpectrogramDataset
- Converts audio folder to torch dataset (returns dict, requires further preprocessing)
- Generates spectrograms for audio files, supports preloading them into memory for fast access.
- Generates enumerated labels (classes) based on additional data provided in metadata.json file
- Generates masks (speech, gender, different noises etc), if intervals provided in metadata.json
- Supports multiple datasets via SpectrogramDatasetRouter class
Usage:
from spgdataset import SpectrogramDataset
dst = SpectrogramDataset(
audio_root: str,
# path to root audio folder
spectrograms_root: str,
# path to folder to store generated spectrograms
index_root: str,
# path to folder to store indexes
metadata_json_path: str | None = None,
# path to metadata.json (see below for structure)
sample_rate: int = 16000,
# sample rate (currently only 16kHz)
window_size_sec: float = 1.92,
# sliding window size, in seconds
window_offset_sec: float = 0.1,
# sliding window offset, in seconds
window_content_ratio: float = 0.5,
# content to slice ratio, if intervals provided
hop_length: int = 160,
# hop length to generate spectrograms
n_mels: int = 80,
# number of mel filters (64 or 80)
n_fft: int = 400,
# N_FFT
normalize: bool = True,
# normalize spectrograms
dtype: torch.dtype = torch.float32,
# torch.dtype to use for spectrograms
split: tuple = (1,),
# divide samples into train, validate, test, etc sets via tuple
output_configuration: dict | None =
# expected output configuration:
{
"audio": False,
# bool, return audio
"spectrogram": True,
# bool, return spectrogram
"masks": ["speech"],
# list, names of masks to generate
# based on provided intervals
# in metadata.json
"meta": [],
# list, names of optional values
# to return, based on provided
# values in 'metadata' dict
# in metadata.json
"label": None,
# enumerate and return class labels
# based on metadata
},
max_memory_cache: int = 0,
# max memory space to retain spectrograms
num_workers: int = 0,
# number of dataset workers
)
metadata.json expected structure:
[
path (str): {
'length': int,
'intervals': {
name (str): [[],...]
},
'metadata': {
key (str): value
}
},
]
path: file path (relative to audio root)
length: audio file length in samples
intervals: dict of various intervals {name: [[start,stop],...]}
metadata: dict, optional data per file, like speaker name or gender, age etc, can be used as class labels
SpectrogramDatasetRouter
Additional class to route multiple SpectrogramDatasets as a single dataset
dstrt = SpectrogramDatasetRouter(
datasets: list = []
# list of datasets to route to
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spgdataset-0.0.1.tar.gz
(33.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spgdataset-0.0.1.tar.gz.
File metadata
- Download URL: spgdataset-0.0.1.tar.gz
- Upload date:
- Size: 33.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd662ce8c4440ce02dd339a1ec13d3cda93fc8963890b400c83591dd1e263f36
|
|
| MD5 |
6e676d56ace872fa740ed895aa97f5ae
|
|
| BLAKE2b-256 |
fa8247f4511f673641e7784afbfde1d518e7442a1883532ab6815e9825f1d9d5
|
File details
Details for the file spgdataset-0.0.1-py3-none-any.whl.
File metadata
- Download URL: spgdataset-0.0.1-py3-none-any.whl
- Upload date:
- Size: 33.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
88a4493f245493ffa8cf45117ba004c0c2358bca3b37cfa163fa896b52fb5071
|
|
| MD5 |
97ea881dbe6492b647e7012a145237a9
|
|
| BLAKE2b-256 |
560a2bb9fb4c9c3249396e9c88ad826be895c203e1de46321d4dcbff6eec6656
|