Script for extracting audio and saving it in .WAV files and computing their Mel spectrogram and saving it in .JPEG file

Project description

video2spectrogram

About

This package is meant to automate the process of extracting audio files from videos and saving the plots computed from these audio frequencies in the Mel scale (Sectrogram). Videos are processed in parallel with the audio extracted by ffmpeg stored in .wav files which are then used to create spectrograms stored as .JPEG and can be used by any audio-based method.

Currently supported video formats include .mp4,mpeg-4,.avi,.wmv. If you have a different extension, you can simply change the script to include them (in the video2spectrogram/get_spectrogram.py)

Package requirements

librosa
numpy
matplotlib

Make sure that the above packages are installed before running any functions.

ffmpeg: You will need to have installed ffmpeg in order to perform the audio extraction from the video files.

Multiprocessing: The code uses multiprocessing for improving speeds, thus the total time required for the conversion varies across different processors. The code has been tested on an AMD Ryzen 3950X with an average conversion time of 4 minutes for ~1K videos (with an average resolution of 480p and length of 5s.)

Dataset structure

The package assumes a fixed video dataset structure:

<dataset>    
  │
  └──<class 1>
  │     │
  │     │─── <video_1.mp4>
  │     │─── <video_2.mp4>
  │     │─── ...
  │    ...      
  │
  └───<class 2>
  │      │
  │      │─── <video_1.mp4>
  │      │─── <video_2.mp4>
  │      │─── ...
 ...    ...

Usage

The main code is at the get_spectrograme.py file. To run the convertor simply call the convert function with the base directory of the dataset and the destination directory for where to save the audio. Additional arguments that can be used:

verbose_lvl: Integer for verbosity.
save_wav: Boolean to determine if the created wav files are to be stored and not deleted.
ar: Integer for the ffmpeg option for specifying the audio sampling frequency.
res_h: Integer for the height of the spectrogram image to be saved.
res_w: Integer for the width of the spectrogram image to be saved.
dpi: Integer for the display's dot's per inch. Needs to be set to avoid inconsistencies to the res argument.

from video2spectrogram import convert
#or
from get_spectrogram import convert

convert(my_dataset_dir, my_target_dir)

Installation through git

Please make sure, Git is installed in your machine:

$ sudo apt-get update
$ sudo apt-get install git
$ git clone https://github.com/alexandrosstergiou/video2spectrogram.git
$ cd dataset2database
$ pip install .

You can then use it as any other package installed through pip.

Installation through pip

The latest stable release is also available for download through pip

$ pip install video2spectrogram

Project details

Release history Release notifications | RSS feed

This version

0.1

Oct 28, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

video2spectrogram-0.1-py3-none-any.whl (17.5 kB view details)

Uploaded Oct 28, 2021 Python 3

File details

Details for the file video2spectrogram-0.1-py3-none-any.whl.

File metadata

Download URL: video2spectrogram-0.1-py3-none-any.whl
Upload date: Oct 28, 2021
Size: 17.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for video2spectrogram-0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e085cc59be1f24ab0ede42d97bf48a3de5cadb2f71944c3bf7cf4b4f26a5584b`
MD5	`d919266811fb4c476bccbb1890ca9cae`
BLAKE2b-256	`eedc5ae7331a574b88d3d49579247c1c977d6139b8b10280c50cc9a5f479615b`

See more details on using hashes here.

video2spectrogram 0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta