SpectrogramDataset

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

SpectrogramDataset

pip install spgdataset

Converts audio folder to torch dataset (returns dict, requires further preprocessing, for example custom collate function (see below))
Generates spectrograms for audio files, supports preloading them into memory for fast access.
Generates enumerated labels (classes) based on additional data provided in metadata.json file
Generates masks (speech, gender, different noises etc), if intervals provided in metadata.json
Supports multiple datasets via SpectrogramDatasetRouter class

Usage:

from spgdataset import SpectrogramDataset

dst = SpectrogramDataset(
        audio_root: str,         
           # path to root audio folder
        spectrograms_root: str,  
           # path to folder to store generated spectrograms
        index_root: str,         
           # path to folder to store indexes
        metadata_json_path: str | None = None, 
           # path to metadata.json (see below for structure)
        sample_rate: int = 16000, 
           # sample rate (currently only 16kHz)
        window_size_sec: float = 1.92,  
           # sliding window size, in seconds
        window_offset_sec: float = 0.1, 
           # sliding window offset, in seconds
        window_content_ratio: float = 0.5, 
           # content to slice ratio, if intervals provided
        hop_length: int = 160,  
           # hop length to generate spectrograms
        n_mels: int = 80,       
           # number of mel filters (64 or 80)
        n_fft: int = 400,       
           # N_FFT
        normalize: bool = True, 
           # normalize spectrograms
        dtype: torch.dtype = torch.float32, 
           # torch.dtype to use for spectrograms
        split: tuple = (1,), 
           # divide samples into train, validate, test, etc sets via tuple
        output_configuration: dict | None =  
           # expected output configuration:
            {
                "audio": False,      
                    # bool, return audio
                "spectrogram": True, 
                    # bool, return spectrogram
                "masks": ["speech"], 
                    # list, names of masks to generate 
                    # based on provided intervals 
                    # in metadata.json
                "meta": [],          
                    # list, names of optional values 
                    # to return, based on provided
                    # values in 'metadata' dict
                    # in metadata.json
                "label": None,
                    # enumerate and return class labels
                    # based on metadata 
            },
        max_memory_cache: int = 0,  
           # max memory space to retain spectrograms
        num_workers: int = 0, 
           # number of dataset workers
        )

Example

a = spgdataset.SpectrogramDataset(
    audio_root="/path_to_dataset/audio",
    spectrograms_root="/path_to_dataset/cache",
    index_root="/path_to_dataset/index",
    metadata_json_path="/path_to_dataset/metadata/metadata_new.json",
    output_configuration={
        "audio": False,
        "spectrogram": True,
        "masks": ["speech"],
        "meta": [],
        "label": None,
    },
    max_memory_cache=16000,
)

It will create instance of SpectrogramDataset, convert audio to spectrograms, store them in /cache, create index and store index files to /index, load intervals ['speech'] from metadata.json, load spectrograms to memory (up to 16000Mb allowed to claim) and on getitem will return dict {'spectrogram': torch.Tensor, 'masks': {'speech': torch.Tensor}}. No labels will be returned.

metadata.json expected structure:

[
    path (str): {
            'length': int,
            'intervals': {
                    name (str): [[],...]
                },
            'metadata': {
                    key (str): value
                } 
        },
]


path:        file path (relative to audio root)
length:      audio file length in samples
intervals:   dict of various intervals {name: [[start,stop],...]}
metadata:    dict, optional data per file, like speaker name or gender, age etc, can be used as class labels

SpectrogramDatasetRouter

Additional class to route multiple SpectrogramDatasets as a single dataset

dstrt = SpectrogramDatasetRouter(
    datasets: list = []
        # list of datasets to route to
)

Example of custom collate function

Here's an example of a custom collate function to get batched spectrogram and masks/speech from dataset outputs

    def custom_collate_fn(data):
        inputs = torch.stack([d["spectrogram"] for d in data]).unsqueeze(1)
        masks = torch.stack([d["masks"]["speech"] for d in data])
        return inputs, masks

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.2

Jan 23, 2025

0.0.1

Jan 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spgdataset-0.0.2.tar.gz (34.2 kB view details)

Uploaded Jan 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spgdataset-0.0.2-py3-none-any.whl (33.7 kB view details)

Uploaded Jan 23, 2025 Python 3

File details

Details for the file spgdataset-0.0.2.tar.gz.

File metadata

Download URL: spgdataset-0.0.2.tar.gz
Upload date: Jan 23, 2025
Size: 34.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for spgdataset-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`6419655ebd5d20162fad30d82d40d9de4581537077d4b3425de3667000d4033d`
MD5	`2e0b66ec5cb24efaf1ae6ac425bc608a`
BLAKE2b-256	`e73b7008c069e185bdb93a6d80ac6b9269eafb16c7f9aad31710c49ce9ff2e14`

See more details on using hashes here.

File details

Details for the file spgdataset-0.0.2-py3-none-any.whl.

File metadata

Download URL: spgdataset-0.0.2-py3-none-any.whl
Upload date: Jan 23, 2025
Size: 33.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for spgdataset-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3c426974b44aa6b50953f5609cf05641462f56991491c6d1eae320b69f5bd5aa`
MD5	`58c8c299ed6d570ec67924b1e40412ca`
BLAKE2b-256	`d11e8e34ebe6b4195718651b9f66e28ba082cad6b2fe0f7f2dc8cb32e9a5e33f`

See more details on using hashes here.

spgdataset 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SpectrogramDataset

Usage:

Example

metadata.json expected structure:

SpectrogramDatasetRouter

Example of custom collate function

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes