Command-line tool to collect video metadata and comments from Youtube API
Project description
youte
A command line utility to get YouTube video metadata and comments from YouTube Data API.
Installation
python -m pip install youte
YouTube API key
To get data from YouTube API, you will need a YouTube API key. Follow YouTube instructions to obtain a YouTube API key if you do not already have one.
Configure API key (recommended)
You can save your API key in the youte config file for reuse. To do so, run:
youte config add-key
The interactive prompt will ask you to input your API key and name it. The name is used to identify the key, and can be anything you choose.
The prompt will also ask if you want to set the given key as default.
When running queries, if no API key or name is specified, youte
will automatically use the default key.
Manually set a key as default
If you want to manually set an existing key as a default, run:
youte config set-default <name-of-existing-key>
Note that what is passed to this command is the name of the API key, not the API key itself. It follows that the API key has to be first added to the config file using youte config add-key
. If you use a name that has not been added to the config file, an error will be raised.
See the list of all keys
To see the list of all keys, run:
youte config list-keys
The default key, if there is one, will have an asterisk next to it.
Remove a key
To remove a stored key, run:
youte config remove <name-of-key>
About the config file
youte's config file is stored in a central place whose exact location depends on the running operating system:
- Linux/Unix: ~/.config/youte/
- Mac OS X: ~/Library/Application Support/youte/
- Windows: C:\Users\<user>\AppData\Roaming\youte
Search
Usage: youte search [OPTIONS] QUERY [OUTPUT]
Do a YouTube search.
QUERY: search query
OUTPUT: name of json file to store output data
Options:
--from TEXT Start date (YYYY-MM-DD)
--to TEXT End date (YYYY-MM-DD)
--type TEXT Type of resource to search for [default:
video]
--name TEXT Specify an API name added to youte config
--key TEXT Specify a YouTube API key
--order [date|rating|relevance|title|videoCount|viewCount]
Sort results [default: date]
--safe-search [none|moderate|strict]
Include or exclude restricted content
[default: none]
--video-duration [any|long|medium|short]
Include videos of a certain duration
--channel-type [any|show] Restrict search to a particular type of
channel
--video-type [any|episode|movie]
Search a particular type of videos
--caption [any|closedCaption|none]
Filter videos based on if they have captions
--definition, --video-definition [any|high|standard]
Include videos by definition
--dimension, --video-dimension [any|2d|3d]
Search 2D or 3D videos
--embeddable, --video-embeddable [any|true]
Search only embeddable videos
--license, --video-license [any|creativeCommon|youtube]
Include videos with a certain license
--max-results INTEGER RANGE Maximum number of results returned per page
[default: 50; 0<=x<=50]
-l, --limit INTEGER RANGE Maximum number of result pages to retrieve
[1<=x<=13]
--resume TEXT Resume progress from this file
--to-csv PATH Tidy data to CSV file
--help Show this message and exit.
Example
youte search 'study with me' --from 2022-08-01 --to 2022-08-07
youte search gaza --from 2022-07-25 --name user_2 --safe-search moderate --order=title gaza.json
Arguments and options
QUERY
The terms to search for. You can also use the Boolean NOT (-) and OR (|) operators to exclude videos or to find videos that match one of several search terms. If the terms contain spaces, the entire QUERY value has to be wrapped in quotes.
youte search "koala|australia zoo -kangaroo"
If you are looking for exact phrases, the exact phrases can be wrapped in double quotes, then wrapped again in single quotes.
youte search '"australia zoo"' aussie_zoo.jsonl
OUTPUT
(optional)
Path of the output file where raw JSON responses will be stored. Must have .json
file endings (e.g., .json
or .jsonl
). If the output file already exists, youte
will update the existing file, instead of overwriting it.
If no OUTPUT argument is passed, results will be printed to the terminal.
--from
(optional)
Start date limit for the search results returned - the results returned by the API should only contain videos created on or after this date (UTC time, which is the default time zone for the YouTube API). Has to be in ISO format (YYYY-MM-DD).
--to
(optional)
End date limit for the search results returned - the results returned by the API should only contain videos created on or before this date (UTC time, which is the default time zone for the YouTube API). Has to be in ISO format (YYYY-MM-DD).
--name
(optional)
Name of the API key, if you don't want to use the default API key or if no default API key has been set.
The API key name has to be added to the config file first using youte config add-key
.
--key
(optional)
The API key to use, if you want to use a key not configured with youte.
--type
(optional)
Type of Youtube resource to retrieve. Can be one type or a comma-separated list of acceptable types, which are channel
, playlist
, video
.
--limit
(optional)
Specify the number of result pages to retrieve. Useful when you want to test search terms without using up your quota.
--order
(optional)
Specify how results will be sorted.
date
: Resources are sorted in reverse chronological order based on the date they were created (default value).rating
: Resources are sorted from highest to lowest rating.relevance
: Resources are sorted based on their relevance to the search query.title
– Resources are sorted alphabetically by title.videoCount
– Channels are sorted in descending order of their number of uploaded videos.viewCount
– Resources are sorted from highest to lowest number of views. For live broadcasts, videos are sorted by number of concurrent viewers while the broadcasts are ongoing.
--safe-search
(optional)
Specify whether restricted content is included or exclude.
--video-duration
(optional)
Include videos of a certain duration.
--channel-type
(optional)
Restrict search to a particular type of channel. --type
has to include channel
.
--caption
(optional)
Restrict search to videos with or without captions. --type
has to include video
.
--definition
(optional)
Restrict search to videos with a certain definition. --type
has to include video
.
--dimension
(optional)
Restrict search to 2D or 3D videos. --type
has to include video
.
--embeddable
(optional)
Only search for embeddable videos. --type
has to include video
.
--license
(optional)
Only include videos with a certain licence.
--max-results
(optional)
Maximum number of results returned per page. The default value is 50.
--resume
(optional)
Resume progress from a progress file.
Searching is very expensive in terms of API quota (100 units per search results page). Therefore, youte
saves the progress of a search so that if you exit the program prematurely, you can choose to resume the search to avoid wasting valuable quota.
When you exit the program in the middle of a search, a prompt will ask if you want to save progress. If yes, all search page tokens are stored to a database in the .youte.history folder inside your current directory.
The name of the progress file is printed on the terminal, as demonstrated below:
Do you want to save your current progress? [y/N]: y
Progress saved at /home/boyd/Documents/youte/.youte.history/search_1669178310.db
To resume progress, run the same youte search command and add `--resume search_1669178310`
To resume progress of this query, run the same query again and add --resume <NAME OF PROGRESS>
.
You can also run youte list-history
to see the list of resumable progress files found in .youte.history folder inside your current directory.
--to-csv
(optional)
Tidy results into a CSV file.
Hydrate a list of IDs
youte hydrate
takes a list of resource IDs, and get the full data associated with them.
Usage: youte hydrate [OPTIONS] [ITEMS]...
Hydrate YouTube resource IDs.
Get all metadata for a list of resource IDs. By default, the function hydrates video IDs.
All IDs passed in the command must be of one kind.
OUTPUT: name of JSON file to store output
ITEMS: ID(s) of item as provided by YouTube
Options:
-o, --output FILENAME
-f, --file-path TEXT Get IDs from file
--kind [videos|channels|comments]
Sort results [default: videos]
--name TEXT Specify an API name added to youte config
--key TEXT Specify a YouTube API key
--to-csv PATH Tidy data to CSV file
--help Show this message and exit.
Examples
# one video
youte hydrate _KrKdj50mPk
# two video
youte hydrate _KrKdj50mPk hpwPciW74b8 --output videos.json
# hydrate channel information and use IDs from a text file
youte hydrate -f channel_ids.txt --kind channel
Arguments and options
ITEMS
YouTube resource IDs. If there are multiple IDs, separate each one with a space.
The IDs should all belong to one type, i.e. either video, channel, or comment. For example, you cannot mix both video AND channel IDs in one command.
-f
or --file-path
If you want to use IDs from a text file, specify this option with the path to the text file (e.g., .csv
or .txt
). The text file should contain a line-separated list of IDs.
One file should contain one type of IDs.
--kind
Specify which kind of resources the IDs are. The default value is videos
.
Get all comments of a video or all replies to a top-level comment thread
Usage: youte get-comments [OPTIONS] [ITEMS]...
Get YouTube comments by video IDs or thread IDs.
OUTPUT: name of JSON file to store output
ITEMS: ID(s) of item as provided by YouTube
Options:
-o, --output FILENAME
-f, --file-path TEXT Get IDs from file
-t, --by-thread Get all replies to a parent comment
-v, --by-video Get all comments for a video ID
--name TEXT Specify an API name added to youte config
--key TEXT Specify a YouTube API key
--to-csv PATH Tidy data to CSV file
--help Show this message and exit.
Example
# get comments on a video
youte get-comments -v comments_for_videos.json WOoQOd33ZTY
# get replies to a thread
youte get-comments -t replies.json UgxkjPsKbo2pUEAJju94AaABAg
Arguments and options
ITEMS
Video or comment thread IDs (unique identifiers provided by YouTube). If there are multiple IDs, separate each one with a space.
The IDs should all belong to one type, i.e. either video or comment thread. You cannot mix both video AND comment thread IDs in one command.
-f
or --file-path
If you want to use IDs from a text file, specify this option with the path to the text file (e.g., .csv
or .txt
). The text file should contain a line-separated list of IDs.
One file should contain one type of IDs (i.e. either video or comment thread). You cannot add both video and comment thread IDs in the same file.
--by-thread
, --by-video
- Get all replies to a comment thread (
--by-thread
,-t
) - Get all comments on a video (
--by-video
,-v
)
If none of these flags are passed, the get-comments
command works similarly to hydrate
- getting full data for a list of comment IDs.
Only one flag can be used in one command.
Tidy JSON responses
Usage: youte tidy [OPTIONS] FILEPATH... OUTPUT
Tidy raw JSON response into relational SQLite databases
Options:
--help Show this message and exit.
The tidy
command will detect the type of resources in the JSON file (i.e. video, channel, search results, or comments) and process the data accordingly. It's important that each JSON file contains just one type of resource. You can specify multiple JSON files, but the final argument has to be the output database.
Database schemas
youte tidy
processes JSON data into different schemas depending on the type of resource in the JSON file. Here are the schema names with their corresponding YouTube resources.
Resource | Schema |
---|---|
Search results (from youte search ) |
search_results |
Videos | videos |
Channels | channels |
Comment threads (top-level comments) | comments |
Replies to comment threads | comments |
Data dictionary
Refer to YouTube Data API for definitions of the properties or fields returned in the data. Some fields such as likeCount
, viewCount
, and commentCount
will show as null if these statistics are disabled.
Dehydrate
Usage: youte dehydrate [OPTIONS] INFILE
Extract an ID list from a file of YouTube resources
INFILE: JSON file of YouTube resources
Options:
-o, --output FILENAME Output text file to store IDs in
--help Show this message and exit.
YouTube API Quota system and youte handling of quota
Most often, there is a limit to how many requests you can make to YouTube API per day. YouTube Data API uses a quota system, whereby each request costs a number of units depending on the endpoint the request is made to.
For example:
- search endpoint costs 100 units per request
- video, channel, commentThread, and comment endpoints each costs 1 unit per request
Free accounts get an API quota cap of 10,000 units per project per day, which resets at midnight Pacific Time.
At present, you can only check your quota usage on the Quotas page in the API Console. It is not possible to monitor quota usage via metadata returned in the API response.
youte
does not monitor quota usage. However, it handles errors when quota is exceeded by sleeping until quota reset time.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file youte-1.2.0.tar.gz
.
File metadata
- Download URL: youte-1.2.0.tar.gz
- Upload date:
- Size: 28.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e55311b19874266b293ef7d7d5282964eb37b17de9a0a35df05b4eb72da235d8 |
|
MD5 | b678cd62df7be22e8c6f3e5b7c0d8070 |
|
BLAKE2b-256 | 39177d0f9fdab74f39216e16775f0dcc062218968f422574ede9339d853b8e4f |
File details
Details for the file youte-1.2.0-py3-none-any.whl
.
File metadata
- Download URL: youte-1.2.0-py3-none-any.whl
- Upload date:
- Size: 22.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | baea93bb50d0b29a8427eda3919c6f0468783df6e2cfee7d13d316964d492a17 |
|
MD5 | 1942f729fc5f3eba07f372d09692af2c |
|
BLAKE2b-256 | ae1cf070bb7ca7744210fc165c8fc55f7bbdf90b7b90d21dbaa85a769e38c222 |