Skip to main content

Generate soundscapes based on images.

Project description

Soundscape Generation

Generate soundscapes from images.

Table of Contents

  1. Installation
  2. Usage
  3. References

Installation

Scaper Installation

The sound generation module was developed using Scaper. Given a collection of isolated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single probabilistically defined specification.

Follow the instructions give in the following link:

Install Dependencies

pip install -r requirements.txt

Download Cityscapes Dataset

To download the dataset, a cityscapes account is required for the authentification. Such an account can be created on www.cityscapes-dataset.com. After the registration, run the download_data.sh script. During the download, it will ask you to provide your email and password for authentification.

./scripts/download_data.sh

Usage

For the object detection module a pre-trained ERFNet is used, which is then finetuned on the Cityscapes dataset.

Train Object Segmentation Network

To train the network, run the follwing command. The hyperparameters epoch and batch size can be configured in the docker-compose.yml file. To load a pre-trained model specify its path in the MODEL_TO_LOAD variable, if the variable is None the model is trained from scratch.

docker-compose up train_object_detection

Test the Segmentation Network

Run the following command to predict the semantic segmentation of every image in the --test_images directory (note: predictions are saved with the same name and a _pred.jpg suffix). Ensure that you specify the correct image's file type in --test_images_type.

docker-compose up predict_object_detection

Evaluate the Segmentation Network

To evaluate the segmentation network run the command below.

docker-compose up evaluation

Generate soundscapes

To generate soundscapes of every image in the --test_images directory run the following command. The generated audios will be saved in data/soundscapes. Ensure that you specify the correct image's file type in --test_images_type.

docker-compose up sound_generation

Results

Object Detection

The above predictions are produced by a network trained for 67 epochs that achieves a mean class IoU score of 0.7084 on the validation set. The inference time on a Tesla P100 GPU is around 0.2 seconds per image. The model was trained for 70 epochs on a single Tesla P100. After the training, the checkpoint that yielded to highest validation IoU score was selected. The progression of the IoU metric is shown below.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soundscape_generation-0.1.1.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

soundscape_generation-0.1.1-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file soundscape_generation-0.1.1.tar.gz.

File metadata

  • Download URL: soundscape_generation-0.1.1.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.11

File hashes

Hashes for soundscape_generation-0.1.1.tar.gz
Algorithm Hash digest
SHA256 11d4ac9fb05112ef3dc9e280566bd3129c4d1612a0bffcbe0a07463b862e14f6
MD5 ea0f7909c06817dd4ca98cdae052b114
BLAKE2b-256 72eda709f22f86bb3916d13a280e4acf1fa510dabbfdd9a323340f982c4da235

See more details on using hashes here.

File details

Details for the file soundscape_generation-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: soundscape_generation-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.11

File hashes

Hashes for soundscape_generation-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 50a5ef191431d3279afb11aeb22c96d141ff963c5aa40930fa8f6986be6d2d95
MD5 153a0aba0bc87757f4ce455f6da2ce20
BLAKE2b-256 840c79b1f01f792399ba077f9ef88fed0d67206904af33681606438b4d00ee91

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page