yt-fts is a simple python script that uses yt-dlp to scrape all of a youtube channels subtitles and load them into an sqlite database that is searchable from the command line. It allows you to query a channel for specific key word or phrase and will generate time stamped youtube urls to the video containing the keyword.
Project description
yt-fts - Youtube Full Text Search
yt-fts
is a command line program that uses yt-dlp to scrape all of a youtube channels subtitles and load them into an sqlite database that is searchable from the command line. It allows you to query a channel for specific key word or phrase and will generate time stamped youtube urls to
the video containing the keyword.
It also supports semantic search via the OpenAI embeddings API using chromadb.
https://github.com/NotJoeMartinez/yt-fts/assets/39905973/6ffd8962-d060-490f-9e73-9ab179402f14
Installation
pip install yt-fts
Dependencies:
This project requires yt-dlp installed globally. Platform specific installation instructions are available on the yt-dlp wiki.
pip
python3 -m pip install -U yt-dlp
MacOS/Homebrew
brew install yt-dlp
Windows/winget
winget install yt-dlp
download
Download subtitles for a channel.
Takes a channel url or id as an argument. Specify the number of jobs to parallelize the download with the --number-of-jobs
option.
yt-fts download --number-of-jobs 5 "https://www.youtube.com/@3blue1brown"
list
List saved channels.
The (ss) next to the channel name indicates that the channel has semantic search enabled.
yt-fts list
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID ┃ Name ┃ Count ┃ Channel ID ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 1 │ ChessPage1 (ss) │ 19 │ UCO2QPmnJFjdvJ6ch-pe27dQ │
│ 2 │ 3Blue1Brown │ 127 │ UCYO_jab_esuFRV4b17AJtAw │
│ 3 │ george hotz archive │ 410 │ UCwgKmJM4ZJQRJ-U5NjvR2dg │
│ 4 │ The Tim Dillon Show │ 288 │ UC4woSp8ITBoYDmjkukhEhxg │
│ 5 │ Academy of Ideas (ss) │ 190 │ UCiRiQGCHGjDLT9FQXFW0I3A │
└────┴───────────────────────┴───────┴──────────────────────────┘
search
Full text search for string in saved channels.
- The search string does not have to be a word for word and match
- Search strings are limited to 40 characters.
# search in all channels
yt-fts search "life in the big city"
# search in specific channel
yt-fts search "life in the big city" --channel "The Tim Dillon Show"
# search in specific channel by id
yt-fts search "life in the big city" -c 4
"Dennis would go hey life in the big city"
Channel: The Tim Dillon Show
Title: 154 - The 3 AM Episode - YouTube
Time Stamp: 00:58:53.789
Video ID: MhaG3Yfv1cU
Link: https://youtu.be/MhaG3Yfv1cU?t=3530
Search in video
yt-fts search "text to search" --video [VIDEO_ID]
Advanced Search Syntax
The search string supports sqlite Enhanced Query Syntax. which includes things like prefix queries which you can use to match parts of a word.
yt-fts search "rea* kni* Mali*" --channel "The Tim Dillon Show"
output:
"real knife fight down here in Malibu I"
Channel: The Tim Dillon Show
Title: #200 - Knife Fights In Malibu | The Tim Dillon Show - YouTube
Time Stamp: 00:45:39.420
Video ID: e79H5nxS65Q
Link: https://youtu.be/e79H5nxS65Q?t=2736
vsearch
Vector search, requires that you enable semantic search for a channel with get-embeddings
.
It has the same options as search
but output will be sorted by similarity to the search string
and the return limit is 10.
yt-fts vsearch "deep quote by russian author" --channel "Academy of Ideas"
"the great Russian author Fyodor Dostoevsky above all don't
lie to yourself he wrote the man who lies to"
Distance: 0.25210678577423096
Channel: Academy of Ideas - (UCiRiQGCHGjDLT9FQXFW0I3A)
Title: The Psychology of Self-Deception - YouTube
Time Stamp: 00:10:01.749
Video ID: Uig8Lw7ixI0
Link: https://youtu.be/Uig8Lw7ixI0?t=598
How To
Export search results:
For both the search
and vsearch
commands you can export the results to a csv file with
the --export
flag. and it will save the results to a csv file in the current directory.
yt-fts search "life in the big city" --export
yt-fts vsearch "existing in large metropolaten center" --export
Delete a channel:
You can delete a channel with the delete
command.
yt-fts delete --channel "3Blue1Brown"
Update a channel: The update command currently only works for full text search and will not update the semantic search embeddings.
yt-fts update --channel "3Blue1Brown"
Semantic Search via OpenAI embeddings API
You can enable semantic search for a channel by using the get-embeddings
command.
This feature requires an OpenAI API key set in the environment variable OPENAI_API_KEY
,
or you can pass the key with the --openai-api-key
flag.
get-embedings
Fetches OpenAI embeddings for specified channel
yt-fts get-embeddings --channel "3Blue1Brown"
After the embeddings are saved you will see a (ss)
next to the channel name when you
list channels and you will be able to use the vsearch
command for that channel.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.