Skip to main content

SpectrogramDataset

Project description

SpectrogramDataset

  • Converts audio folder to torch dataset (returns dict, requires further preprocessing)
  • Generates spectrograms for audio files, supports preloading them into memory for fast access.
  • Generates enumerated labels (classes) based on additional data provided in metadata.json file
  • Generates masks (speech, gender, different noises etc), if intervals provided in metadata.json
  • Supports multiple datasets via SpectrogramDatasetRouter class

Usage:

from spgdataset import SpectrogramDataset
dst = SpectrogramDataset(
        audio_root: str,         
           # path to root audio folder
        spectrograms_root: str,  
           # path to folder to store generated spectrograms
        index_root: str,         
           # path to folder to store indexes
        metadata_json_path: str | None = None, 
           # path to metadata.json (see below for structure)
        sample_rate: int = 16000, 
           # sample rate (currently only 16kHz)
        window_size_sec: float = 1.92,  
           # sliding window size, in seconds
        window_offset_sec: float = 0.1, 
           # sliding window offset, in seconds
        window_content_ratio: float = 0.5, 
           # content to slice ratio, if intervals provided
        hop_length: int = 160,  
           # hop length to generate spectrograms
        n_mels: int = 80,       
           # number of mel filters (64 or 80)
        n_fft: int = 400,       
           # N_FFT
        normalize: bool = True, 
           # normalize spectrograms
        dtype: torch.dtype = torch.float32, 
           # torch.dtype to use for spectrograms
        split: tuple = (1,), 
           # divide samples into train, validate, test, etc sets via tuple
        output_configuration: dict | None =  
           # expected output configuration:
            {
                "audio": False,      
                    # bool, return audio
                "spectrogram": True, 
                    # bool, return spectrogram
                "masks": ["speech"], 
                    # list, names of masks to generate 
                    # based on provided intervals 
                    # in metadata.json
                "meta": [],          
                    # list, names of optional values 
                    # to return, based on provided
                    # values in 'metadata' dict
                    # in metadata.json
                "label": None,
                    # enumerate and return class labels
                    # based on metadata 
            },
        max_memory_cache: int = 0,  
           # max memory space to retain spectrograms
        num_workers: int = 0, 
           # number of dataset workers
        )

metadata.json expected structure:

[
    path (str): {
            'length': int,
            'intervals': {
                    name (str): [[],...]
                },
            'metadata': {
                    key (str): value
                } 
        },
]


path:        file path (relative to audio root)
length:      audio file length in samples
intervals:   dict of various intervals {name: [[start,stop],...]}
metadata:    dict, optional data per file, like speaker name or gender, age etc, can be used as class labels

SpectrogramDatasetRouter

Additional class to route multiple SpectrogramDatasets as a single dataset

dstrt = SpectrogramDatasetRouter(
    datasets: list = []
        # list of datasets to route to
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spgdataset-0.0.1.tar.gz (33.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spgdataset-0.0.1-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file spgdataset-0.0.1.tar.gz.

File metadata

  • Download URL: spgdataset-0.0.1.tar.gz
  • Upload date:
  • Size: 33.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for spgdataset-0.0.1.tar.gz
Algorithm Hash digest
SHA256 cd662ce8c4440ce02dd339a1ec13d3cda93fc8963890b400c83591dd1e263f36
MD5 6e676d56ace872fa740ed895aa97f5ae
BLAKE2b-256 fa8247f4511f673641e7784afbfde1d518e7442a1883532ab6815e9825f1d9d5

See more details on using hashes here.

File details

Details for the file spgdataset-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: spgdataset-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for spgdataset-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 88a4493f245493ffa8cf45117ba004c0c2358bca3b37cfa163fa896b52fb5071
MD5 97ea881dbe6492b647e7012a145237a9
BLAKE2b-256 560a2bb9fb4c9c3249396e9c88ad826be895c203e1de46321d4dcbff6eec6656

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page