Audio Captioning datasets for Pytorch.
Project description
Audio Captioning datasets for PyTorch
Audio Captioning unofficial datasets source code for AudioCaps [1], Clotho [2], and MACS [3], designed for PyTorch.
Installation
pip install aac-datasets
Examples
Create Clotho dataset
from aac_datasets import Clotho
dataset = Clotho(root=".", download=True)
item = dataset[0]
audio, captions = item["audio"], item["captions"]
# audio: Tensor of shape (n_channels=1, audio_max_size)
# captions: list of str
Build PyTorch dataloader with Clotho
from torch.utils.data.dataloader import DataLoader
from aac_datasets import Clotho
from aac_datasets.utils import BasicCollate
dataset = Clotho(root=".", download=True)
dataloader = DataLoader(dataset, batch_size=4, collate_fn=BasicCollate())
for batch in dataloader:
# batch["audio"]: list of 4 tensors of shape (n_channels, audio_size)
# batch["captions"]: list of 4 lists of str
...
Datasets stats
Here is the statistics for each dataset :
AudioCaps | Clotho | MACS | |
---|---|---|---|
Subset(s) | train, val, test | dev, val, eval, test, analysis | full |
Sample rate | 32000 | 44100 | 48000 |
Estimated size | 43GB | 27GB | 13GB |
Audio source | AudioSet (youtube) | Freesound | TAU Urban Acoustic Scenes 2019 |
Here is the train subset statistics for each dataset :
AudioCaps/train | Clotho/dev | MACS/full | |
---|---|---|---|
Nb audios | 49838 | 3840 | 3930 |
Total audio duration | 136.6h1 | 24.0h | 10.9h |
Audio duration range | 0.5-10s | 15-30s | 10s |
Nb captions per audio | 1 | 5 | 2-5 |
Nb captions | 49838 | 19195 | 17275 |
Total nb words2 | 402482 | 217362 | 160006 |
Sentence size2 | 2-52 | 8-20 | 5-40 |
1 This duration is estimated on the total duration of 46230/49838 files of 126.7h.
2 The sentences are cleaned (lowercase+remove punctuation) and tokenized using the spacy tokenizer to count the words.
Requirements
Python packages
The requirements are automatically installed when using pip on this repository.
torch >= 1.10.1
torchaudio >= 0.10.1
py7zr >= 0.17.2
pyyaml >= 6.0
tqdm >= 4.64.0
External requirements (AudioCaps only)
The external requirements needed to download AudioCaps are ffmpeg and youtube-dl.
These two programs can be download on Ubuntu using sudo apt install ffmpeg youtube-dl
.
You can also override their paths for AudioCaps:
from aac_datasets import AudioCaps
AudioCaps.FFMPEG_PATH = "/my/path/to/ffmpeg"
AudioCaps.YOUTUBE_DL_PATH = "/my/path/to/youtube_dl"
dataset = AudioCaps(root=".", download=True)
Download datasets
To download a dataset, you can use download=True
argument in dataset construction.
However, if you want to download datasets separately, you can also use the following command :
aac-datasets-download --root "." clotho --subsets "dev"
Or use the corresponding function in the code :
from aac_datasets.download import download_clotho
download_clotho(root=".", subsets=["dev"])
References
AudioCaps
[1] C. D. Kim, B. Kim, H. Lee, and G. Kim, “Audiocaps: Generating captions for audios in the wild,” in NAACL-HLT, 2019. Available: https://aclanthology.org/N19-1011/
Clotho
[2] K. Drossos, S. Lipping, and T. Virtanen, “Clotho: An Audio Captioning Dataset,” arXiv:1910.09387 [cs, eess], Oct. 2019, Available: http://arxiv.org/abs/1910.09387
MACS
[3] F. Font, A. Mesaros, D. P. W. Ellis, E. Fonseca, M. Fuentes, and B. Elizalde, Proceedings of the 6th Workshop on Detection and Classication of Acoustic Scenes and Events (DCASE 2021). Barcelona, Spain: Music Technology Group - Universitat Pompeu Fabra, Nov. 2021. Available: https://doi.org/10.5281/zenodo.5770113
Cite the aac-datasets package
If you use this software, please consider cite it as below :
@software{
Labbe_aac-datasets_2022,
author = {Labbé, Etienne},
license = {MIT},
month = {01},
title = {{aac-datasets}},
url = {https://github.com/Labbeti/aac-datasets/},
version = {0.3.2},
year = {2023}
}
Contact
Maintainer:
- Etienne Labbé "Labbeti": labbeti.pub@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.