Generate soundscapes based on images.
Project description
Soundscape Generation
Table of Contents
Installation
Scaper Installation
The sound generation module was developed using Scaper. Given a collection of isolated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single probabilistically defined specification.
Follow the instructions give in the following link:
Download Dependencies
pip install -r requirements.txt
Download Cityscapes Dataset
To download the dataset, a cityscapes account is required for the authentification. Such an account can be created
on www.cityscapes-dataset.com. After the registration, run the download_data.sh
script. During the download, it will ask you to provide your email and password for authentification.
./download_data.sh
Usage
For the object detection module a pre-trained ERFNet is used, which is then finetuned on the Cityscapes dataset.
Train Object Segmentation Network
To train the network, run the follwing command.
python train.py --num_epochs 70 --batch_size 8 --evaluate_every 1 --save_weights_every 1
By default, training resumes from the latest saved checkpoint. If the checkpoints/
directory is missing, the training
starts from scratch.
Test the Segmentation Network
Run the following command to predict the semantic segmentation of every image in the test_images/
directory (note:
results are saved in the test_segmentations/
directory)
python predict.py
Ensure that you specify the image's file type in the image path variable in predict.py
.
Generate soundscapes
Run the file soundGeneration.py to generate soundscapes of every image in the test_images/
directory (note: results
are saved in the soundscapes/
directory). Ensure that you specify the image type of the image in the image path
variable of predict.py
.
Results
Object Detection
The above predictions are produced by a network trained for 67 epochs that achieves a mean class IoU score of 0.7084 on the validation set. The inference time on a Tesla P100 GPU is around 0.2 seconds per image. The model was trained for 70 epochs on a single Tesla P100. After the training, the checkpoint that yielded to highest validation IoU score was selected. The progression of the IoU metric is shown below.
References
- J. Salamon, D. MacConnell, M. Cartwright, P. Li and J. P. Bello, "Scaper: A library for soundscape synthesis and augmentation," 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2017, pp. 344-348, DOI: 10.1109/WASPAA.2017.8170052.
- E. Romera et al., "ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation", 2017
- Official PyTorch implementation of ERFNet
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file soundscape-generation-0.1.0.tar.gz
.
File metadata
- Download URL: soundscape-generation-0.1.0.tar.gz
- Upload date:
- Size: 14.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68b11d5dc311edc0f0a5a17f02daebe0a5dd3d52624c3f15ab23012ca629a41b |
|
MD5 | 1a6cf49580314145bf173258a087e8a8 |
|
BLAKE2b-256 | ae7e96ace4a9c3fa455dd93c17a599e9ae678a50a57a8dca71aa1df5fb0aec44 |
File details
Details for the file soundscape_generation-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: soundscape_generation-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ad6518ab766cdc116779b3df200797c27dae47439f6626804be1c75629f321e |
|
MD5 | 9b8062993e165beee0704857f91adfa0 |
|
BLAKE2b-256 | b3327bdd67ee1ed20fe1564ed88e73f22543d610d811743f51234f14631a7b24 |