Skip to main content

Search all of a YouTube channel from the command line

Project description

Will this deoploy if I make a push to a branch? what about a pr? maybe I need to push the active branch with the tag?

  1. You make changes in a branch
  2. make a pr
  3. merge the pr
  4. add a tag to the last commit in the main branch
  5. push the tag to main
git tag -l
git tag v0.1.48
git push origin v0.1.48

yt-fts - Youtube Full Text Search

yt-fts is a command line program that uses yt-dlp to scrape all of a youtube channels subtitles and load them into an sqlite database that is searchable from the command line. It allows you to query a channel for specific key word or phrase and will generate time stamped youtube urls to the video containing the keyword.

It also supports semantic search via the OpenAI embeddings API using chromadb.

https://github.com/NotJoeMartinez/yt-fts/assets/39905973/6ffd8962-d060-490f-9e73-9ab179402f14

Installation

pip install yt-fts

yt-dlp dependency:

This project requires yt-dlp installed globally. Platform specific installation instructions are available on the yt-dlp wiki.

# MacOS/Homebrew
brew install yt-dlp
# Windows/winget
winget install yt-dlp
# pip
python3 -m pip install -U yt-dlp

download

Download subtitles for a channel.

Takes a channel url or id as an argument. Specify the number of jobs to parallelize the download with the --number-of-jobs option.

yt-fts download --number-of-jobs 5 "https://www.youtube.com/@3blue1brown"

list

List saved channels.

The (ss) next to the channel name indicates that the channel has semantic search enabled.

yt-fts list
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID ┃ Name                  ┃ Count ┃ Channel ID               ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 1  │ ChessPage1 (ss)       │ 19    │ UCO2QPmnJFjdvJ6ch-pe27dQ │
│ 2  │ 3Blue1Brown           │ 127   │ UCYO_jab_esuFRV4b17AJtAw │
│ 3  │ george hotz archive   │ 410   │ UCwgKmJM4ZJQRJ-U5NjvR2dg │
│ 4  │ The Tim Dillon Show   │ 288   │ UC4woSp8ITBoYDmjkukhEhxg │
│ 5  │ Academy of Ideas (ss) │ 190   │ UCiRiQGCHGjDLT9FQXFW0I3A │
└────┴───────────────────────┴───────┴──────────────────────────┘

search (Full Text Search)

Full text search for a string in saved channels.

  • The search string does not have to be a word for word and match
  • Search strings are limited to 40 characters.
# search in all channels
yt-fts search "[search query]" 

# search in channel 
yt-fts search "[search query]" --channel "[channel name or id]" 

# search in specific video
yt-fts search "[search query]" --video "[video id]"

# limit results 
yt-fts search "[search query]" --limit "[number of results]" --channel "[channel name or id]"

# export results to csv
yt-fts search "[search query]" --export --channel "[channel name or id]" 

Advanced Search Syntax:

The search string supports sqlite Enhanced Query Syntax. which includes things like prefix queries which you can use to match parts of a word.

# AND search
yt-fts search "knife AND Malibu" --channel "The Tim Dillon Show" 

# OR SEARCH 
yt-fts search "knife OR Malibu" --channel "The Tim Dillon Show" 

# wild cards
yt-fts search "rea* kni* Mali*" --channel "The Tim Dillon Show" 

Semantic Search

You can enable semantic search for a channel by using the get-embeddings command. This requires an OpenAI API key set in the environment variable OPENAI_API_KEY, or you can pass the key with the --openai-api-key flag.

get-embedings

Fetches OpenAI embeddings for specified channel

# make sure openAI key is set
# export OPENAI_API_KEY="[yourOpenAIKey]"

yt-fts get-embeddings --channel "3Blue1Brown"

After the embeddings are saved you will see a (ss) next to the channel name when you list channels and you will be able to use the vsearch command for that channel.

vsearch (Semantic Search)

vsearch is for "Vector search". This requires that you enable semantic search for a channel with get-embeddings. It has the same options as search but output will be sorted by similarity to the search string and the default return limit is 10.

# search by channel name
yt-fts vsearch "[search query]" --channel "[channel name or id]"

# search in specific video
yt-fts vsearch "[search query]" --video "[video id]"

# limit results 
yt-fts vsearch "[search query]" --limit "[number of results]" --channel "[channel name or id]"

# export results to csv
yt-fts vsearch "[search query]" --export --channel "[channel name or id]" 

How To

Export search results: For both the search and vsearch commands you can export the results to a csv file with the --export flag. and it will save the results to a csv file in the current directory.

yt-fts search "life in the big city" --export
yt-fts vsearch "existing in large metropolaten center" --export

Delete a channel: You can delete a channel with the delete command.

yt-fts delete --channel "3Blue1Brown"

Update a channel: The update command currently only works for full text search and will not update the semantic search embeddings.

yt-fts update --channel "3Blue1Brown"

Export all of a channel's transcript: This command will create a directory in current working directory with the youtube channel id of the specified channel.

# Export to vtt
yt-fts export --channel "[id/name]" --format "[vtt/txt]"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

experimental-yt-fts-0.1.49.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

experimental_yt_fts-0.1.49-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file experimental-yt-fts-0.1.49.tar.gz.

File metadata

  • Download URL: experimental-yt-fts-0.1.49.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for experimental-yt-fts-0.1.49.tar.gz
Algorithm Hash digest
SHA256 ab611ef935f49a8e2610e3a4a1467f2124ef4a73660f900f0c733a6543c62a85
MD5 2480e4e2cd2d526077c2fcafa7c30f21
BLAKE2b-256 22272ea88a17fd5701a6e07218b994c8982f8651bbe38718f229f9a11305a1f3

See more details on using hashes here.

File details

Details for the file experimental_yt_fts-0.1.49-py3-none-any.whl.

File metadata

File hashes

Hashes for experimental_yt_fts-0.1.49-py3-none-any.whl
Algorithm Hash digest
SHA256 fef4a9827ca3a47eec9199ff84f5e41b3fae3172efb672224243beafee7d9ff2
MD5 630d9005b905917065fcf1202af55068
BLAKE2b-256 42d3681437c86e307a6f663a0dcb2715f30f93fcd848a42022163447c98ee346

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page