Skip to main content

Extract YouTube video titles and URLs with end-to-end web scraping API + automate Selenium webdriver dependency set up

Project description

Python Quick Start

Python 3.6+ setup (required if not already installed)

This package uses f-strings (more here), and so requires Python 3.6+.

If you have an older version of Python, you can download Python 3.8.2 (follow links below) and follow the instructions to set up Python for your machine. If you want to install a different version, visit the Python Downloads page and select the version you want.

Permissions for first run

This is required to make sure you can download and install the required Selenium binary dependencies.

On Windows: makes sure you open Command Prompt or Powershell (both work) in "Run as Administrator" mode
  • shortcut: ⊞ Win + X + A
On Unix based machines (MacOS, Linux): make sure you have read and write access to /usr/local/bin/
  • if you're not sure, open terminal and run sudo chown $USER /usr/local/bin/

Installing the package

After you install Python 3.6+ and ensure you have the required permissions as needed, enter the following in your command line:

# if something isn't working properly, try rerunning this
# the problem may have been fixed with a newer version

pip3 install -U yt-videos-list     # MacOS/Linux
pip  install -U yt-videos-list     # Windows
Running the package from the python interpreter
python3     # MacOS/Linux
python      # Windows
from yt_videos_list import ListCreator


my_driver = 'firefox' # SUBSTITUTE DRIVER YOU WANT (options below)
lc = ListCreator(driver=my_driver, scroll_pause_time=0.8)


lc.create_list_for(url='https://www.youtube.com/user/schafer5')
lc.create_list_for(url='https://www.youtube.com/channel/UC8butISFwT-Wl7EV0hUK0BQ')


# see the new files that were just created:
import os
os.system('ls -lt | head')                      # MacOS/Linux
os.system('dir /O-D | find "_videos_list"')     # Windows

# for more information on using the module:
help(lc)
  • driver options include:
    • 'firefox'
    • 'opera'
    • 'safari' (MacOS only)
    • 'chrome'
    • 'brave'
    • 'edge' (Windows only!)
  • increase scroll_pause_time for laggy internet and decrease scroll_pause_time for fast internet

If you already scraped a channel and the channel uploaded a new video, simply rerun this program on that channel and this package updates your files to include the newer video(s)!

Explicitly downloading all Selenium dependencies

Ideal if you use Selenium for other projects 😎

  • Make sure you already have the yt-videos-list package installed (follow directions above for getting set up), then run the following:
pip3 install -U yt-videos-list # MacOS/Linux: ensure latest package
python3                        # MacOS/Linux: enter python interpreter
pip install -U yt-videos-list  # Windows:     ensure latest package
python                         # Windows:     enter python interpreter
from yt_videos_list.download import selenium_webdriver_dependencies
selenium_webdriver_dependencies.download_all()

That's all! 🤓

More API information

NOTE that you can also access all the information below from the Python interpreter by entering

import yt_videos_list
help(yt_videos_list)

# default options for the ListCreator object

ListCreator(
            csv=True,
            txt=True,
            md=True,
            reverse_chronological=True,
            headless=False,
            scroll_pause_time=0.8,
            driver='Firefox'
            )

There are a number of optional arguments you can specify during the instantiation of the ListCreator object. The preceding arguments are run by default, but in case you want more flexibility, you can specify the:

  • driver argument:
    • Firefox (default)
    • Opera
    • Safari
    • Chrome
    • Brave
    • Edge (Windows only)
      • driver='firefox'
      • driver='opera'
      • driver='safari'
      • driver='chrome'
      • driver='brave'
      • driver='edge'
  • csv, txt, md file type argument:
    • True (default) - create a file for the specified type
    • False - do not create a file for the specified type.
      • txt=True (default) OR txt=False
      • csv=True (default) OR csv=False
      • md=True (default) OR md=False
  • reverse_chronological argument:
    • True (default) - write the files in order from most recent video to the oldest video
    • False - write the files in order from oldest video to the most recent video
      • reverse_chronological=True (default) OR reverse_chronological=False
  • headless argument:
    • False (default) - run the driver with an open Selenium instance for viewing
    • True - run the driver in "invisible" mode.
      • headless=False (default) OR headless=True
  • scroll_pause_time argument:
    • any float values greater than 0 (default 0.8).
      • The value you provide will be how long the program waits before trying to scroll the videos list page down for the channel you want to scrape. For fast internet connections, you may want to reduce the value, and for slow connections you may want to increase the value.
    • scroll_pause_time=0.8 (default)
    • CAUTION: reducing this value too much will result in the program not capturing all the videos, so be careful! Experiment :)
Usage Statistics

Back to main page

If you found this interesting or useful, please consider starring this repo so other people can more easily find and use this. Thanks!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_videos_list-0.4.5.tar.gz (22.3 kB view details)

Uploaded Source

File details

Details for the file yt_videos_list-0.4.5.tar.gz.

File metadata

  • Download URL: yt_videos_list-0.4.5.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for yt_videos_list-0.4.5.tar.gz
Algorithm Hash digest
SHA256 537b2f14190d35a7915de3cfeb29ffb8f5ee66c885d88ead46e75a97e70b3be5
MD5 d1d34ac1a16c4594c77ec5f530f1245f
BLAKE2b-256 c0601eba3397c041205e341a10a9bb398ddc96922e4f28be72da1396902ef817

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page