Skip to main content

SpectrogramDataset

Project description

SpectrogramDataset

pip install spgdataset
  • Converts audio folder to torch dataset (returns dict, requires further preprocessing, for example custom collate function (see below))
  • Generates spectrograms for audio files, supports preloading them into memory for fast access.
  • Generates enumerated labels (classes) based on additional data provided in metadata.json file
  • Generates masks (speech, gender, different noises etc), if intervals provided in metadata.json
  • Supports multiple datasets via SpectrogramDatasetRouter class

Usage:

from spgdataset import SpectrogramDataset
dst = SpectrogramDataset(
        audio_root: str,         
           # path to root audio folder
        spectrograms_root: str,  
           # path to folder to store generated spectrograms
        index_root: str,         
           # path to folder to store indexes
        metadata_json_path: str | None = None, 
           # path to metadata.json (see below for structure)
        sample_rate: int = 16000, 
           # sample rate (currently only 16kHz)
        window_size_sec: float = 1.92,  
           # sliding window size, in seconds
        window_offset_sec: float = 0.1, 
           # sliding window offset, in seconds
        window_content_ratio: float = 0.5, 
           # content to slice ratio, if intervals provided
        hop_length: int = 160,  
           # hop length to generate spectrograms
        n_mels: int = 80,       
           # number of mel filters (64 or 80)
        n_fft: int = 400,       
           # N_FFT
        normalize: bool = True, 
           # normalize spectrograms
        dtype: torch.dtype = torch.float32, 
           # torch.dtype to use for spectrograms
        split: tuple = (1,), 
           # divide samples into train, validate, test, etc sets via tuple
        output_configuration: dict | None =  
           # expected output configuration:
            {
                "audio": False,      
                    # bool, return audio
                "spectrogram": True, 
                    # bool, return spectrogram
                "masks": ["speech"], 
                    # list, names of masks to generate 
                    # based on provided intervals 
                    # in metadata.json
                "meta": [],          
                    # list, names of optional values 
                    # to return, based on provided
                    # values in 'metadata' dict
                    # in metadata.json
                "label": None,
                    # enumerate and return class labels
                    # based on metadata 
            },
        max_memory_cache: int = 0,  
           # max memory space to retain spectrograms
        num_workers: int = 0, 
           # number of dataset workers
        )

Example

a = spgdataset.SpectrogramDataset(
    audio_root="/path_to_dataset/audio",
    spectrograms_root="/path_to_dataset/cache",
    index_root="/path_to_dataset/index",
    metadata_json_path="/path_to_dataset/metadata/metadata_new.json",
    output_configuration={
        "audio": False,
        "spectrogram": True,
        "masks": ["speech"],
        "meta": [],
        "label": None,
    },
    max_memory_cache=16000,
)

It will create instance of SpectrogramDataset, convert audio to spectrograms, store them in /cache, create index and store index files to /index, load intervals ['speech'] from metadata.json, load spectrograms to memory (up to 16000Mb allowed to claim) and on getitem will return dict {'spectrogram': torch.Tensor, 'masks': {'speech': torch.Tensor}}. No labels will be returned.

metadata.json expected structure:

[
    path (str): {
            'length': int,
            'intervals': {
                    name (str): [[],...]
                },
            'metadata': {
                    key (str): value
                } 
        },
]


path:        file path (relative to audio root)
length:      audio file length in samples
intervals:   dict of various intervals {name: [[start,stop],...]}
metadata:    dict, optional data per file, like speaker name or gender, age etc, can be used as class labels

SpectrogramDatasetRouter

Additional class to route multiple SpectrogramDatasets as a single dataset

dstrt = SpectrogramDatasetRouter(
    datasets: list = []
        # list of datasets to route to
)

Example of custom collate function

Here's an example of a custom collate function to get batched spectrogram and masks/speech from dataset outputs

    def custom_collate_fn(data):
        inputs = torch.stack([d["spectrogram"] for d in data]).unsqueeze(1)
        masks = torch.stack([d["masks"]["speech"] for d in data])
        return inputs, masks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spgdataset-0.0.2.tar.gz (34.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spgdataset-0.0.2-py3-none-any.whl (33.7 kB view details)

Uploaded Python 3

File details

Details for the file spgdataset-0.0.2.tar.gz.

File metadata

  • Download URL: spgdataset-0.0.2.tar.gz
  • Upload date:
  • Size: 34.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for spgdataset-0.0.2.tar.gz
Algorithm Hash digest
SHA256 6419655ebd5d20162fad30d82d40d9de4581537077d4b3425de3667000d4033d
MD5 2e0b66ec5cb24efaf1ae6ac425bc608a
BLAKE2b-256 e73b7008c069e185bdb93a6d80ac6b9269eafb16c7f9aad31710c49ce9ff2e14

See more details on using hashes here.

File details

Details for the file spgdataset-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: spgdataset-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 33.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for spgdataset-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3c426974b44aa6b50953f5609cf05641462f56991491c6d1eae320b69f5bd5aa
MD5 58c8c299ed6d570ec67924b1e40412ca
BLAKE2b-256 d11e8e34ebe6b4195718651b9f66e28ba082cad6b2fe0f7f2dc8cb32e9a5e33f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page