Skip to main content

Luigi pipeline to download VoxCeleb audio from YouTube and extract speaker segments

Project description

voxceleb-luigi

Luigi pipeline to download VoxCeleb audio from YouTube and extract speaker segments.

This pipeline can download both the original VoxCeleb and VoxCeleb2.

Installation

pip install voxceleb_luigi

You need to have ffmpeg and youtube-dl installed. On systems with apt, you can simply run:

sudo apt install ffmpeg youtube-dl

Usage

Some configuration is necessary, the easiest way is to put it in your luigi.cfg (default location is the current working directory; you can override this by setting the LUIGI_CONFIG_PATH environment variable).

[voxceleb.Config]
# Necessary, otherwise the pipeline will try to save data to /
data_out_dir=/path/to/voxceleb

# Only necessary if youtube-dl, ffmpeg, and ffprobe are not in your PATH:
ffmpeg_bin=/ffmpeg-dir/ffmpeg
ffmpeg_directory=/ffmpeg-dir
youtube_dl_bin=/path/to/youtube-dl

# 1 for VoxCeleb, 2 for VoxCeleb2 (default)
dataset=2

To run the pipeline, first start luigid:

luigid --background

and then start the workers:

luigi --module voxceleb_luigi --workers 5 voxceleb.ProcessPeople

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voxceleb_luigi-0.1.2.linux-x86_64.tar.gz (5.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page