macrawlon

A sweet little collection of handy functions for video file downloading. 📼

Project description

(a Multi-processing Audiovisual CRAWLer collectiON)

supported versions Licence

About

A package for crawling and downloading YouTube videos. As multiple datasets that are introduced only provide the ids of videos without a download script, obtaining the video files may be difficult. This project aims to provide a general solution is such cases by downloading either the video or audio from ids specified by a dataset. It also aims to speed up processing though enabling multiple threads to run in parallel. The video resolution is user set in order to speed-up downloading and to limit the on-disk dataset size.

Currently only video-only or audio-only files are downloaded (the next update/version will allow to also download videos with audio).

Package Requirements

This is the list of the required packages:

pandas
pafy
ffmpeg
youtube-dl
tqdm

They can all be downloaded with:

$ pip install pandas pafy tqdm

CSV Dataset file

The package assumes that the following headers are included in the .csv file that includes the YouTube ids:

youtube_id	start	end (or) duration

The name of the headers do not need to match exactly but the data needs to include the id, start time end time or duration.

Usage

The main function used to download files is called download() as is located at the youtube_audio_and_video_downloader.py. You can simply call it by first importing it:

from macrawlon import download
#or
from youtube_audio_and_video_downloader import download

download(
  csv_dir=my_csv_dir,
  download_dir=my_down_dir,
  modality='video',
  resolutions=my_res_list,
  id_idx = 0,
  start_idx = 1,
  end_idx = None,
  duration=10,
  workers=5
  )

The function takes the following arguments:

Argument	About
`csv_dir`	directory for the dataset `.csv` file.
`download_dir`	directory for the location to download
`modality`	video modality to download, can choose `audio`, `video`, `audio-video` for separate audio and video files or `audio+video` for video files with audio.
`resolutions`	(optional) list of resolution qualities, with the first list elements being the preferred options.
`id_idx`	(optional) The column index in the csv file that contains the youtube video ids. E.g. if `0` then the first column of the csv should have the youtube video ids.
`start_idx`	(optional) The index for the starting location (in secs.) in the video.
`end_idx`	(optional) The index for the ending location (in secs.) in the video.
`duration`	(optional) The duration (in secs.) of the video. To be used if `end_idx` is not specified.
`workers`	(optional) The number of sub-processes to run.

Installation through git

Please make sure, Git is installed in your machine:

$ sudo apt-get update
$ sudo apt-get install git
$ git clone https://github.com/macrawlon/macrawlon.git
$ cd macrawlon
$ pip install .

You can then use it as any other package installed through pip.

Installation through pip

The latest stable release is also available for download through pip

$ pip install macrawlon

Licence

MIT

Project details

Release history Release notifications | RSS feed

This version

0.1

May 12, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

macrawlon-0.1.tar.gz (7.0 kB view details)

Uploaded May 12, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

macrawlon-0.1-py3-none-any.whl (7.4 kB view details)

Uploaded May 12, 2022 Python 3

File details

Details for the file macrawlon-0.1.tar.gz.

File metadata

Download URL: macrawlon-0.1.tar.gz
Upload date: May 12, 2022
Size: 7.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.7

File hashes

Hashes for macrawlon-0.1.tar.gz
Algorithm	Hash digest
SHA256	`2c9a2cea60ddeb5bed4a0cba4e288b904d0bcf1632ea792530dc22683c4d669f`
MD5	`2587e8ca324a82ca62486b9ae73d819a`
BLAKE2b-256	`0b9fdd645e4b72eafff98952c6da8cdfdc6b220afeb0ad8d3201cf2417099249`

See more details on using hashes here.

File details

Details for the file macrawlon-0.1-py3-none-any.whl.

File metadata

Download URL: macrawlon-0.1-py3-none-any.whl
Upload date: May 12, 2022
Size: 7.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.7

File hashes

Hashes for macrawlon-0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8bd9909df3bad5ee5dff9b2a9fe8abf8a9c3d2e1f82c01d490898b300517c25d`
MD5	`f50f484a549698d2701495a70a951d55`
BLAKE2b-256	`3b4bbf10c5f2911f382e810ac01352aa2f8c6e1d846bcce94aad3b785736cf3c`

See more details on using hashes here.

macrawlon 0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

(a Multi-processing Audiovisual CRAWLer collectiON)

About

Package Requirements

CSV Dataset file

Usage

Installation through git

Installation through pip

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes