Download YouTube metadata for videos relating to a search query
Project description
Download YouTube metadata for videos relating to a search query
This is a Python script that can download metadata (including comments and likes) for YouTube videos relating to a search query. Uses the YouTube Data API v3. Metadata is saved in an sqlalchemy
compatible database, for instance, PostgreSQL or SQLite.
Metatube is pauses retrieval once your daily quota is used up (the default as of this writing is 10,000 requests per day) and waits until quota refill. If interrupted, metatube will, upon restart, first fill gaps in the download history, then continue downloading ‘into the future’. Once caught up to within ten minutes of the current time, metatube exits.
If you use metatube for scientific research, please cite it in your publication:
Fink, C. (2020): metatube: Python script to download YouTube metadata. doi:10.5281/zenodo.3773302.
Installation
pip install metatube
Configuration
Copy the example configuration file metatube.yml.example to a suitable location, depending on your operating system:
- on Linux systems:
- system-wide configuration:
/etc/metatube.yml
- per-user configuration:
~/.config/metatube.yml
OR${XDG_CONFIG_HOME}/metatube.yml
- system-wide configuration:
- on MacOS systems:
- per-user configuration:
${XDG_CONFIG_HOME}/metatube.yml
- per-user configuration:
- on Microsoft Windows systems:
- per-user configuration:
%APPDATA%\metatube.yml
- per-user configuration:
Adapt the configuration:
- Configure a database connection string (
connection_string
), pointing to an existing database (the format is described in the sqlalchemy documentation. - Configure an API access key to the YouTube Data API v3 (
youtube_api_key
). - Define search terms (
search_terms
)
All of these configuration options can alternatively be supplied as command line arguments to metatube
(see Usage) or as a config
dict
directly to the constructor of YouTubeVideoMetadataDownloader
. Command line options (see metatube --help
) or config
dict
both override config file.
Usage
Command line executable
metatube \
--postgresql-connection-string "postgresql:///metatube" \
--youtube-api-key "abcdefghijklmn" \
"how to build a tallbike"
Python
Import the metatube
module. Instantiate a YouTubeVideoMetadataDownloader
, optionally supply a config
dictionary. Then run the instance’s download()
method.
import metatube
# config from config file
downloader = YouTubeVideoMetadataDownloader()
downloader.download()
# config from config file,
# overriding `search_terms`
downloader = YouTubeVideoMetadataDownloader({
"search_terms": "Critical Mass Vladivostok"
})
downloader.download()
# entire config from dictionary
downloader = YouTubeVideoMetadataDownloader({
"youtube_api_key": "opqrstuvwxyz",
"connection_string": "postgresql://server1/bicyclelover123:supersecretpassword@metatube",
"search_terms": "dashcam bicycle commute albuquerque"
})
downloader.download()
Data privacy
By default, metatube pseudonymises downloaded metadata, i.e. it replaces (direct) identifiers with randomised identifiers (generated using hashes, i.e. ‘one-way encryption’). This serves as one step of a responsible data processing workflow. However, the text and descriptions of videos and comments might nevertheless qualify as indirect identifiers, as they, combined or on their own, might allow re-identification of the commenter or uploader. If you want to use data downloaded using metatube in a GDPR-compliant fashion, you have to follow up the data collection stage with data minimisation and further pseudonymisation or anonymisation efforts.
Metatube can keep original identifiers (i.e. skip pseudonymisation). Set the according command line argument, configuration file or config
dict
(see the sample config file and below). Ensure that you fulfil all legal and organisational requirements to handle personal information before you decide to collect non-pseudonyismed data.
import metatube
downloader = YouTubeVideoMetadataDownloader({
"search_terms": "Winter Cycling Congress",
"pseudonymise": False # get legal/ethics advice before doing this
})
downloader.download()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file metatube-1.0.7.tar.gz
.
File metadata
- Download URL: metatube-1.0.7.tar.gz
- Upload date:
- Size: 27.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.4.2 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d03cc34390cad067d39198a71ec9801deaad4a578f269d0410fbb10529abfac |
|
MD5 | 2da4dcc00c6e2b5c476c9431456c5abe |
|
BLAKE2b-256 | 9cb827356c4a6cd03277e2022f515676c0935c640fa1257121553132950eec6d |