Skip to main content

A sweet little collection of handy functions for video file downloading. 📼

Project description

text

(a Multi-processing Audiovisual CRAWLer collectiON)

supported versions   PyPi   Licence   Tweet


About

A package for crawling and downloading YouTube videos. As multiple datasets that are introduced only provide the ids of videos without a download script, obtaining the video files may be difficult. This project aims to provide a general solution is such cases by downloading either the video or audio from ids specified by a dataset. It also aims to speed up processing though enabling multiple threads to run in parallel. The video resolution is user set in order to speed-up downloading and to limit the on-disk dataset size.

Currently only video-only or audio-only files are downloaded (the next update/version will allow to also download videos with audio).


Package Requirements

This is the list of the required packages:

  • pandas
  • pafy
  • ffmpeg
  • youtube-dl
  • tqdm

They can all be downloaded with:

$ pip install pandas pafy tqdm

CSV Dataset file

The package assumes that the following headers are included in the .csv file that includes the YouTube ids:

youtube_id start end (or) duration

The name of the headers do not need to match exactly but the data needs to include the id, start time end time or duration.


Usage

The main function used to download files is called download() as is located at the youtube_audio_and_video_downloader.py. You can simply call it by first importing it:

from macrawlon import download
#or
from youtube_audio_and_video_downloader import download

download(
  csv_dir=my_csv_dir,
  download_dir=my_down_dir,
  modality='video',
  resolutions=my_res_list,
  id_idx = 0,
  start_idx = 1,
  end_idx = None,
  duration=10,
  workers=5
  )

The function takes the following arguments:

Argument About
csv_dir directory for the dataset .csv file.
download_dir directory for the location to download
modality video modality to download, can choose audio, video, audio-video for separate audio and video files or audio+video for video files with audio.
resolutions (optional) list of resolution qualities, with the first list elements being the preferred options.
id_idx (optional) The column index in the csv file that contains the youtube video ids. E.g. if 0 then the first column of the csv should have the youtube video ids.
start_idx (optional) The index for the starting location (in secs.) in the video.
end_idx (optional) The index for the ending location (in secs.) in the video.
duration (optional) The duration (in secs.) of the video. To be used if end_idx is not specified.
workers (optional) The number of sub-processes to run.

Installation through git

Please make sure, Git is installed in your machine:

$ sudo apt-get update
$ sudo apt-get install git
$ git clone https://github.com/macrawlon/macrawlon.git
$ cd macrawlon
$ pip install .

You can then use it as any other package installed through pip.


Installation through pip

The latest stable release is also available for download through pip

$ pip install macrawlon

Licence

MIT

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

macrawlon-0.1.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

macrawlon-0.1-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file macrawlon-0.1.tar.gz.

File metadata

  • Download URL: macrawlon-0.1.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.7

File hashes

Hashes for macrawlon-0.1.tar.gz
Algorithm Hash digest
SHA256 2c9a2cea60ddeb5bed4a0cba4e288b904d0bcf1632ea792530dc22683c4d669f
MD5 2587e8ca324a82ca62486b9ae73d819a
BLAKE2b-256 0b9fdd645e4b72eafff98952c6da8cdfdc6b220afeb0ad8d3201cf2417099249

See more details on using hashes here.

File details

Details for the file macrawlon-0.1-py3-none-any.whl.

File metadata

  • Download URL: macrawlon-0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.7

File hashes

Hashes for macrawlon-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8bd9909df3bad5ee5dff9b2a9fe8abf8a9c3d2e1f82c01d490898b300517c25d
MD5 f50f484a549698d2701495a70a951d55
BLAKE2b-256 3b4bbf10c5f2911f382e810ac01352aa2f8c6e1d846bcce94aad3b785736cf3c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page