Skip to main content

A simple python package for managing the audio data from Google Research's ontology of 632 audio event classes.

Project description

AudioSet Data Manager

A simple python package for managing the audio data from Google Research's ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos.

Description

Google Research's AudioSet is a repository of audio events that span a wide range of labels. This python package is here to help you navigate, downlead, and edit the entire repository of audio events in order to easily extract the desired files. Each line in the AudioSet csv file format has columns defined by the third header line: # YTID, start_seconds, end_seconds, positive_labels . The package is based on this loose temporal .csv file format; which looks like this:

# Segments csv created Sun Mar 5 10:54:31 2017 positive_labels
# num_ytids=22160 num_segs=22160 num_unique_labels=527 num_positive_labels=52882
# YTID start_seconds end_seconds positive_labels
--PJHxphWEs 30.000 40.000 "/m/09x0r,/t/dd00088"
... ... ... ...

DO NOT ALTER CSV FILE. The python package will automatically format into the following:

YTID start_seconds end_seconds positive_labels
-0RWZT-miFs 420.000 430.000 "/m/03v3yw,/m/0k4j"
... ... ... ...

Getting Started

Dependencies

  • Python v3.x
  • FFmpeg
  • pydub
  • youtubedl
  • pandas

Installing

  1. To install the python packages simply run the following commands
  • pip install requirements.txt
  1. Download the correct FFmpeg packages & executable files depedning on your OS
  1. Add FFmpeg to PATH

Executing program

Creating Manager

  • Instantiate AudioSet Manager by passing in arguments
    • csv argument is the file path to the csv downloaded from this page
    • dir argument is the file path to the desired directory you want files to be saved to
    • ydl_opts argument is the youtubedl configuration format of the downloaded files. See youtubedl docs for more information and this for possible field options
from AudioSet import AudioSet
aud = AudioSet(csv=CSV, dir=DIR, ydl_opts = YDL_OPTS)
print(aud.df.head()) # See the top 5 rows

Filtering by mid

  • In order to narrow down the dataset by a desired audio event, you can filter the entire dataframe according to the audio event's mid. Refer to onotolgy.json for the mid dictionary
aud.filter("/m/0dgw9r") # Keep only audio clips that contain "Human Sounds"
print(aud.df.head()) # Will only contain rows with "Human Sounds"

Downloading Videos and Audio Cutting

  • One can download all the audio in the manager's dataframe
    • Note, this saves to project home directory. Specify desired save directory with ydl_opts argument in constructor.
aud.download()

There are several options for cutting the audio. The wav argument is the path to the desired wav file to cut. These all save the clips under the DIR folder.

  1. Cutting based on start_time and end_time from AudioSet csv files
  • Export files of audio from start_time to end_time
  • aud.split(wav=WAV_PATH)
  1. Cutting based on method 1 and then further cutting based on silence_chunk
  • Export files into segments of non-silent audio from start_time to end_time
  • aud.split_by_silence(wav=WAV_PATH, theta=-35)
  • theta is the silence threshold (default is -35dB)
  1. Cutting based on chunks of time
  • Export files into x seconds clips
  • aud.chunkify(wav=WAV_PATH, seconds=x)

Future Developments

  • Support for strong temporal stamp files
    • In progress
  • More robust file reading
  • More audio editing features

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audioset-manager-0.0.8.tar.gz (359.9 kB view details)

Uploaded Source

Built Distribution

audioset_manager-0.0.8-py3-none-any.whl (369.0 kB view details)

Uploaded Python 3

File details

Details for the file audioset-manager-0.0.8.tar.gz.

File metadata

  • Download URL: audioset-manager-0.0.8.tar.gz
  • Upload date:
  • Size: 359.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.9

File hashes

Hashes for audioset-manager-0.0.8.tar.gz
Algorithm Hash digest
SHA256 dbfa64cfca0ad75a7d1535c16a3ad862f734185fd6b3ab6a202556f97ccd523f
MD5 140e8ae38a650a7b50828b2aa7994c22
BLAKE2b-256 5897f6be40301927ba4b7cd9ee6835c37d76bc6277228cb9ca611ed59d7ed45e

See more details on using hashes here.

File details

Details for the file audioset_manager-0.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for audioset_manager-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 6c0cde146b8ce7f9f1eec0351fb37d1653ecbc953c5c11a186b9e356ac3afc50
MD5 522dd186bad0c03bda1add769cd11e4e
BLAKE2b-256 12cce3aec8a7e2be349fd49d261fc4e3419e1c692f4f20b12d244ac161021e76

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page