Skip to main content

Script for extracting audio and saving it in .WAV files and computing their Mel spectrogram and saving it in .JPEG file

Project description

video2spectrogram

supported versions Tweet

About

This package is meant to automate the process of extracting audio files from videos and saving the plots computed from these audio frequencies in the Mel scale (Sectrogram). Videos are processed in parallel with the audio extracted by ffmpeg stored in .wav files which are then used to create spectrograms stored as .JPEG and can be used by any audio-based method.

Currently supported video formats include .mp4,mpeg-4,.avi,.wmv. If you have a different extension, you can simply change the script to include them (in the video2spectrogram/get_spectrogram.py)


Package requirements

  • librosa
  • numpy
  • matplotlib

Make sure that the above packages are installed before running any functions.

ffmpeg: You will need to have installed ffmpeg in order to perform the audio extraction from the video files.

Multiprocessing: The code uses multiprocessing for improving speeds, thus the total time required for the conversion varies across different processors. The code has been tested on an AMD Ryzen 3950X with an average conversion time of 4 minutes for ~1K videos (with an average resolution of 480p and length of 5s.)


Dataset structure

The package assumes a fixed video dataset structure:

<dataset>    
  │
  └──<class 1>
  │     │
  │     │─── <video_1.mp4>
  │     │─── <video_2.mp4>
  │     │─── ...
  │    ...      
  │
  └───<class 2>
  │      │
  │      │─── <video_1.mp4>
  │      │─── <video_2.mp4>
  │      │─── ...
 ...    ...


Usage

The main code is at the get_spectrograme.py file. To run the convertor simply call the convert function with the base directory of the dataset and the destination directory for where to save the audio. Additional arguments that can be used:

  • verbose_lvl: Integer for verbosity.
  • save_wav: Boolean to determine if the created wav files are to be stored and not deleted.
  • ar: Integer for the ffmpeg option for specifying the audio sampling frequency.
  • res_h: Integer for the height of the spectrogram image to be saved.
  • res_w: Integer for the width of the spectrogram image to be saved.
  • dpi: Integer for the display's dot's per inch. Needs to be set to avoid inconsistencies to the res argument.
from video2spectrogram import convert
#or
from get_spectrogram import convert

convert(my_dataset_dir, my_target_dir)

Installation through git

Please make sure, Git is installed in your machine:

$ sudo apt-get update
$ sudo apt-get install git
$ git clone https://github.com/alexandrosstergiou/video2spectrogram.git
$ cd dataset2database
$ pip install .

You can then use it as any other package installed through pip.


Installation through pip

The latest stable release is also available for download through pip

$ pip install video2spectrogram

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

video2spectrogram-0.1-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file video2spectrogram-0.1-py3-none-any.whl.

File metadata

  • Download URL: video2spectrogram-0.1-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for video2spectrogram-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e085cc59be1f24ab0ede42d97bf48a3de5cadb2f71944c3bf7cf4b4f26a5584b
MD5 d919266811fb4c476bccbb1890ca9cae
BLAKE2b-256 eedc5ae7331a574b88d3d49579247c1c977d6139b8b10280c50cc9a5f479615b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page