yt-fts is a simple python script that uses yt-dlp to scrape all of a youtube channels subtitles and load them into an sqlite database that is searchable from the command line. It allows you to query a channel for specific key word or phrase and will generate time stamped youtube urls to the video containing the keyword.
Project description
yt-fts - Youtube Full Text Search
yt-fts
is a simple python script that uses yt-dlp to scrape all of a youtube channels subtitles
and load them into an sqlite database that is searchable from the command line. It allows you to
query a channel for specific key word or phrase and will generate time stamped youtube urls to
the video containing the keyword.
- Blog Post
- Semantic Search (Experimental)
Installation
pip
pip install yt-fts
from source
git clone https://github.com/NotJoeMartinez/yt-fts
python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt
python3 -m yt-fts
Dependencies
This project requires yt-dlp installed globally. Platform specific installation instructions are available on the yt-dlp wiki.
pip
python3 -m pip install -U yt-dlp
MacOS/Homebrew
brew install yt-dlp
Windows/winget
winget install yt-dlp
Usage
Usage: yt-fts [OPTIONS] COMMAND [ARGS]...
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
delete Delete a channel and all its data.
download Download subtitles from a specified YouTube channel.
export Export search results from a specified YouTube...
generate-embeddings Generate embeddings for a channel using OpenAI's...
list Lists channels saved in the database.
search Search for a specified text within a channel, a...
semantic-search Semantic search for specified text.
update Updates a specified YouTube channel.
download
Download subtitles
Usage: yt-fts download [OPTIONS] CHANNEL_URL
Download subtitles from a specified YouTube channel.
You must provide the URL of the channel as an argument. The script will
automatically extract the channel id from the URL.
Options:
--channel-id TEXT Optional channel id to override the one from the
url
--language TEXT Language of the subtitles to download
--number-of-jobs INTEGER Optional number of jobs to parallelize the run
Examples:
Basic download by url
yt-fts download "https://www.youtube.com/@TimDillonShow/videos"
Multithreaded download
yt-fts download --number-of-jobs 6 "https://www.youtube.com/@TimDillonShow/videos"
specify channel id
If download
fails you can manually input the channel id with the --channel-id
flag.
The channel url should still be an argument
yt-fts download --channel-id "UC4woSp8ITBoYDmjkukhEhxg" "https://www.youtube.com/@TimDillonShow/videos"
specify language
Languages are represented using ISO 639-1 language codes
yt-fts download --language de "https://www.youtube.com/@TimDillonShow/videos"
list
List downloaded channels
Usage: yt-fts list [OPTIONS]
Lists channels saved in the database.
The (ss) next to channel name indicates that semantic search is enabled for
the channel.
Options:
--channel TEXT Optional name or id of the channel to list
yt-fts list
output:
id count channel_name channel_url
---- ------- ------------------- ----------------------------------------------------
1 265 The Tim Dillon Show https://youtube.com/channel/UC4woSp8ITBoYDmjkukhEhxg
2 688 Lex Fridman (ss) https://youtube.com/channel/UCSHZKyawb77ixDdsGog4iWA
3 434 Traversy Media https://youtube.com/channel/UC29ju8bIPH5as8OGnQzwJyA
search
Search saved subtitles
Usage: yt-fts search [OPTIONS] SEARCH_TEXT
Search for a specified text within a channel, a specific video, or across
all channels.
Options:
--channel TEXT The name or id of the channel to search in. This is required
unless the --all or --video options are used.
--video TEXT The id of the video to search in. This is used instead of
the channel option.
--all Search in all channels.
- The search string does not have to be a word for word and match
- Use Id if you have channels with the same name or channels that have special characters in their name
- Search strings are limited to 40 characters.
Examples:
Search by channel
yt-fts search "life in the big city" --channel "The Tim Dillon Show"
# or
yt-fts search "life in the big city" --channel 1 # assuming 1 is id of channel
output:
The Tim Dillon Show: "164 - Life In The Big City - YouTube"
Quote: "van in the driveway life in the big city"
Time Stamp: 00:30:44.580
Video ID: dqGyCTbzYmc
Link: https://youtu.be/dqGyCTbzYmc?t=1841
Search all channels
yt-fts search "text to search" --all
Search in video
yt-fts search "text to search" --video [VIDEO_ID]
Advanced Search Syntax
The search string supports sqlite Enhanced Query Syntax. which includes things like prefix queries which you can use to match parts of a word.
yt-fts search "rea* kni* Mali*" --channel "The Tim Dillon Show"
output:
The Tim Dillon Show: "#200 - Knife Fights In Malibu | The Tim Dillon Show - YouTube"
Quote: "real knife fight down here in Malibu I"
Time Stamp: 00:45:39.420
Video ID: e79H5nxS65Q
Link: https://youtu.be/e79H5nxS65Q?t=2736
export
Export search results to csv. Exported csv will have Channel Name,Video Title,Quote,Time Stamp,Link
as it's headers
Usage: yt-fts export [OPTIONS] SEARCH_TEXT [CHANNEL]
Export search results from a specified YouTube channel or from all channels
to a CSV file.
The results of the search will be exported to a CSV file. The file will be
named with the format "{channel_id or 'all'}_{TIME_STAMP}.csv"
Options:
--all Export from all channels
Examples:
yt-fts export "life in the big city" "The Tim Dillon Show"
You can export from all channels in your database as well
yt-fts export "life in the big city" --all
update
Will update a channel with new subtitles if any are found.
Usage: yt-fts update [OPTIONS]
Updates a specified YouTube channel.
You must provide the ID of the channel as an argument. Keep in mind some
might not have subtitles enabled. This command will still attempt to
download subtitles as subtitles are sometimes added later.
Options:
--channel TEXT The name or id of the channel to update.
[required]
--language TEXT Language of the subtitles to download
--number-of-jobs INTEGER Optional number of jobs to parallelize the run
delete
Will delete a channel from your database
Usage: yt-fts delete [OPTIONS] CHANNEL_NAME or CHANNEL_ID
Delete a channel and all its data.
You must provide the name or the id of the channel you want to delete as an
argument.
The command will ask for confirmation before performing the deletion.
Examples:
yt-fts delete "The Tim Dillon Show"
# or
yt-fts delete 1
Semantic Search via OpenAI embeddings API
The following commands are a work in progress but should enable semantic search. This requires that you have an openAI API key which you can learn more about that here.
Limitations
Keep in mind that generating embeddings will substantially grow the size of your subtitles database and will run slower due to the limitations of working with vectors in sqlite. When running semantic searches for the first time, API access is still required to generate embeddings for the search string. These search string embeddings are saved to a history table and won't require additional api requests after.
semantic-search
Usage: yt-fts semantic [OPTIONS] SEARCH_TEXT
Semantic search for specified text.
Before running this command, you must generate embeddings for the channel
using the generate-embeddings command. This command uses OpenAI's embeddings
API to search for specified text. An OpenAI API key must be set as an
environment variable OPENAI_API_KEY.
Options:
--channel TEXT channel name or id to search in
--all Search all semantic search enabled channels
--limit INTEGER top n results to return
generate-embedings
Usage: yt-fts generate-embedings [OPTIONS]
Generate embeddings for a channel using OpenAI's embeddings API.
Requires an OpenAI API key to be set as an environment variable
OPENAI_API_KEY.
Options:
--channel TEXT The name or id of the channel to generate embeddings
for
--open-api-key TEXT OpenAI API key. If not provided, the script will
attempt to read it from the OPENAI_API_KEY environment
variable.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.